
Reputation
Badges 1
25 × Eureka!I see... We could definitely add an argument to control it. I'll update here once there is an RC
AstonishingSeaturtle47 yes it does. But I have to ask how come you have sub modules that one will have credentials for the master repo and not the sub ones? Also it sounds like a good solution would be for the trains-agent to try and pull the sub-modules and if it cannot, it should just print a warning and continue. What do you think?
FYI all the git pulls are cached even in docker mode so there is no "tax" to pay for pulling the sub-modules (only the first time of course)
Trains is fully open-source, that said properly publishing and maintaining the web client is still on our to do list (I mean there is totally readable JavaScript code packaged in the trains-server and the dockers). It is constantly pushed because there is generally less contributions on the front-end with these kind of projects. That said of you guys are willing to help, it will greatly help in pushing it forward... LivelyLion31 what do you think, would you guys like to help with the fronte...
Hi @<1523708920831414272:profile|SuperficialDolphin93>
The error seems like nvml fails to initialize inside the container, you can test it with nvidia-smi and check if that wirks
Regrading Cuda version the ClearML serving inherits from the Triton container, could you try to build a new one with the latest Triton container (I think 25). The docker compose is in the cleaml serving git repo. wdyt?
what is user properties
Think of them as parameters you can add post execution, that you can also add to the Task table (i.e. customize columns)
how can I add parameters
task.set_user_properties([{"name": "backbone", "description": "network type", "value": "great"},]
Doesnt solve the issue if a HPO run is going to take a few days
The HPO Task has a table of the top performing experiments, so when you go to the "Plot" tab you get a summary of all the runs, with the Task ID of the top performing one.
No need to run through the details of the entire experiments, just look at the summary on the HPO Task.
You can already sort and filter experiments based on any hyper parameter or metric that the experiment reports, there is no need for any custom language query. Also all created filter/sorted table can be shared exactly as they are, so you can create leaderboards and share specific filters. You can also use the search bar in order to filter based on experiment name / comment. Tags will be added soon as well π
Example of custom columns is here (the screen grab is a bit old, now there is als...
Hi @<1795626098352984064:profile|SoggyElk61>
Where you able to pass the ClearMLVisBackend
line in your code?
This needs to be added before your actual code
"sub nodes" inside pipeline, in my opinion, makes them much more useful, in sense that all the steps are visible.
Yeah I really like this idea... continuing this thread, would it also make sense to have a Task object per "sub-node" and run the sub-nodes as subprocess of the parent Node? I'm thinking this sounds like a combination of both local pipeline execution and remote pipeline execution.
wdyt?
Yes the clearml import Must be outside if everything (so it can link with hydra), when you do it this way, by the time you import clearml, hydra is already done
do you suggest to delete those first?
it might make it easier on the server (I think there is some bug there when it deleted many tasks it tries to parallelize the delete process, but fails to properly sync, anyhow this is fixed and will be pushed with the next clearml-server version)
PleasantOwl46 any chance there are subprojects under the requested project?
FloppyDeer99 what am I seeing in the screenshot ?
But why the url in es is different from it in web UI?
They are not really different, but sometimes the "url quote" is an issue (this is the process a browser will take a string url like a/b
and convert it to a%2fb
),
I remember that there was an issue involving double quoting (this is when you have: a/b
-> a%2fb
-> a%252fb
), notice the last one replace "%" with "%25" as in your example...
Let me know i...
Hi TrickySheep9
Hmm I think you are correct, exit remotely will not work inside a jupyter notebook because it will not be able to close it.
I was just revising workflows that might be similar, wdyt?
https://clearml.slack.com/archives/CTK20V944/p1620506210463400?thread_ts=1614234125.066600&cid=CTK20V944
β¦every user in the server has the same credentials, and they donβt need to know them..makes sense?
Make sense, single credentials for everyone, without the need to distribute
Is that correct?
Hmm so the concept of "company" wide configuration is supported in the enterprise version.
I'm trying to think of a "hack" to just pass these env/conf ...
How are you spinning the agent machines?
Hi ReassuredTiger98
I do not want to create extra queues for this since this will not be able to properly distribute tasks.
Queues are the way to abstract different resources to "compute capabilities". It creates a simple interface to users on the one hand and allows you to control the compute on the other Agents can listen to multiple queues with priority. This means an RTX agent can pull from an RTX queue, and if this is empty, it will pull from "default" queueWould that work for ...
Hi LovelyHamster1
Could you think of a toy code that reproduces this issue ?
Hi ScaryLeopard77
Could that be solved with this PR?
https://github.com/allegroai/clearml/pull/548
Just a bit of background, the execute)remotely will kill the current process (after the Task is synced) and enqueue the Task that was created for remote execution. What seems to fail is actually killing the current process. You can just pass exit_process=False
Any idea why the Pipeline Controller is Running despite the task passing?
What do you mean by "the task passing"
The system denies my deletion requiest since it deems the venv-builds dir as in use
Sorry, yes you have to take down the agent when you delete the cache π
I just called exit(0)
in a notebooke and it closed it (the kernel) no exception
From code ? or the CLI ?
In both cases the dataset needs to upload the parent version somewhere, azure blob supported.