Reputation
Badges 1
25 × Eureka!however, this will also turn off metricsΒ
For the sake of future readers, let me clarify on this one, turning it off auto_connect_frameworks={'pytorch': False}
only effects the auto logging of torch.save/load
(side note: the reason is pytorch does not have built in metric reporting, i.e. it is usually done manually and these days most probably with tensorboard, for example lightning / ignite will use tensorboard as default metric reporting),
is removed from the experiment list?
You mean archived ?
Ok, just my ignorance then?Β
LOL, no it is just that with a single discrete parameter the strategy makes less sense π
the other repos i have are constantly worked on and changing too
Not only it will be cloned automatically, the git diff of the sub-modules are stored as well π
the task is being Aborted rather than be in Draft. Am I missing something?
Yes, the reason is for not missing anything that you might have reported on it.
And usually execute_remotely will get the execution queue as a paramter (i.e. immdiatly launching the Task)
You can now (starting v1.0) enqueue an aborted Task so it should not make a difference, you can also reset the Task and edit it in the UI
But these changes havenβt necessarily been merged into main. The correct behavior would be to use the forked repo.
So I would expect the agent to pull from your fork, is that correct? is that what you want to happen ?
Sorry if it's something trivial. I recently started working with ClearML.
No worries, this has actually more to do with how you work with Dask
The Task ID is the unique id of the any Task in the system (task.id will return the UID str)
Can you post a toy Dash code here, I'll explain how to make it compatible with clearml π
My internet traffic looks wierd.I think this is because tensorboard logs too much data on each batch and ClearML send it to server. How can i fix it? My training speed decreased by 5-6 times.
BTW: ComfortableShark77 the network is being sent in background process, it should not effect the processing time, no?
So if you set it, then all nodes will be provisioned with the same execution script.
This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?
I was able to successfully enqueue the task but only entrypoint script is visible to it and nothing else.
So you passed a repository link is it did not show on the Task ?
What exactly is missing and how the Task was created ?
Could be nice to write some automation
could you try this one:frameworks = { 'tensorboard': True, 'pytorch': False }
This would log the TB (in the BKG), but no model registration (i.e. serial)
And if you could also update the docs with all env vars possible to set up it would awesome!
Yes, I'll pass it on, that is a good point
Thanks! Yes, this could be great !
Could you please open a GitHub issue, so we remember to update the feature ?
What's the OS / Python version?
What's the matplotlib version ? and python version?
trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101.
Could you send a log, it should have worked π
Hi JitteryCoyote63
Just making sure, the package itself it installed as part of the "Installed packages", and it also installs a command line utility ?
BTW if the plots are too complicated to convert to interactive plotly graphs, they will be rendered to images and the server will show them. This is usually the case with seaborn plots
PompousParrot44 I see what you mean, yes multiple context switching might cause a bit of decline in performance. not sure how much though ... The alternative of course is to set cpu affinity... Anyhow if you do get there we can try to come up with something that makes sense, but at the end there is no magic there π
I was unable to reproduce, but I added a few safety checks. I'll make sure they are available on the master in a few minutes, could maybe rerun after?
SillyPuppy19 yes you are correct, actually I can promise you the callback will be called from a different thread (basically the monitoring thread) so it's on the user to make sure the callback can handle it .
How about we move this discussion to GitHub?
PompousBeetle71 is this ArgParser argument or a connected dictionary ?
It will automatically switch to docker mode
Hmmm, I'm not sure that you can disable it. But I think you are correct it should be possible. We will add it as another argument to Task.init. That said, FriendlyKoala70 what's the use case for disabling the code detection? You don't have to use it later, but it is always nice to know :)
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
task=Task.current_task()
Will get me the task object. (right?)
PanickyMoth78 yes, always, from anywhere, this is a singleton object π
You will have to build your own docker image based on that docker file, and then update the docker compose
I think that just backing up /opt/clearml and moving it should be just fine π€
But how do you specify the data hyperparameter input and output models to use when the agent runs the experiment
They are autodetected if you are using Argparse / Hydra / python-fire / etc.
The first time you are running the code (either locally or with an agent), it will add the hyper parameter section for you.
That said you can also provide it as part of the clearml-task
command with --args
(btw: clearml-task --help
will list all the options, https://clear.ml/docs/...