Reputation
Badges 1
981 × Eureka!It broke the shift holding to select multiple experiments btw
I just move one experiment in another project, after moving it I am taken to the new project where the layout is then reset
CostlyOstrich36 , actually this only happens for a single agent. The weird thing is that I have a machine with two gpus, and I spawn two agents, one per gpus. Both have the same version. For one, I can see all the logs, but not for the other
sorry, the clearml-session. The error is the one I shared at the beginning of this thread
I still don't see why you would change the type of the cloned Task, I'm assuming the original Task had the correct type, no?
Because it is easier for me that I create a training task out of the controller task by cloning it (so that parameters are prefilled and I can set the parent task id)
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
The task with id a445e40b53c5417da1a6489aad616fee is not aborted and is still running
SuccessfulKoala55 Am I doing/saying something wrong regarding the problem of flushing every 5 secs (See my previous message)
Nevermind, nvidia-smi command fails in that instance, the problem lies somewhere else
Setting to redis from version 6.2 to 6.2.11 fixed it but I have new issues now š
then print(Task.get_project_object().default_output_destination) is still the old value
Disclaimer: I didn't check this will reproduce the bug, but that's all the components that should reproduce it: a for loop creating figures and clearml logging them
Ok, in that case it probably doesnāt work, because if the default value is 10 secs, it doesnāt match what I get in the logs of the experiment: every second the tqdm adds a new line
Thanks for sharing the issue UnevenDolphin73 , Iāll comment on it!
Iāve set dynamic: āstrictā in the template of the logs index and I was able to keep the same mapping after doing the reindex
SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
Hi CostlyOstrich36 , I am not using Hydra, only OmegaConf, so you mean just calling OmegaConf.load should be enough?
I could delete the files manually with sudo rm (sudo is required, otherwise I get Permission Denied )
I have the same problem, but not only with subprojects, but for all the projects, I get this blank overview tab as shown in the screenshot. It only worked for one project, that I created one or two weeks ago under 0.17
to pass secrets to each experiment
you mean to run it on the CI machine ?
yes
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
It just to test that the logic being executed in if not Task.running_locally() is correct
Iād like to move to a setup where I donāt need these tricks
AgitatedDove14 I think itās on me to take the pytorch distributed example in the clearml repo and try to reproduce the bug, then pass it over to you š
mmmh probably yes, I canāt say for sure (because I donāt remember precisely when I upgraded to 0.17) but it looks like that