self.clearml_task.get_initial_iteration() also gives me the correct number
I donât have a registry to push my image to.I think I can get around it actually - Will it work if I just build the image locally once, then start the agent? Docker would recognise that image locally and just use it right? I wonât need to update that image often anyway
same as the first one described
Yes, I set:auth { cookies { httponly: true secure: true domain: ".clearml.xyz.com" max_age: 99999999999 } }It always worked for me this way
Ok, so what worked for me in the end was:config = task.connect_configuration(read_yaml(conf_path)) cfg = OmegaConf.create(config._to_dict())
GrumpyPenguin23 yes, it is the latest
AgitatedDove14 , what I was looking for was: parent_task = Task.get_task(task.parent)
It broke the shift holding to select multiple experiments btw
I just move one experiment in another project, after moving it I am taken to the new project where the layout is then reset
CostlyOstrich36 , actually this only happens for a single agent. The weird thing is that I have a machine with two gpus, and I spawn two agents, one per gpus. Both have the same version. For one, I can see all the logs, but not for the other
sorry, the clearml-session. The error is the one I shared at the beginning of this thread
I still don't see why you would change the type of the cloned Task, I'm assuming the original Task had the correct type, no?
Because it is easier for me that I create a training task out of the controller task by cloning it (so that parameters are prefilled and I can set the parent task id)
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
The task with id a445e40b53c5417da1a6489aad616fee is not aborted and is still running
SuccessfulKoala55 Am I doing/saying something wrong regarding the problem of flushing every 5 secs (See my previous message)
Nevermind, nvidia-smi command fails in that instance, the problem lies somewhere else
Setting to redis from version 6.2 to 6.2.11 fixed it but I have new issues now đ
then print(Task.get_project_object().default_output_destination) is still the old value
Disclaimer: I didn't check this will reproduce the bug, but that's all the components that should reproduce it: a for loop creating figures and clearml logging them
Ok, in that case it probably doesnât work, because if the default value is 10 secs, it doesnât match what I get in the logs of the experiment: every second the tqdm adds a new line
Thanks for sharing the issue UnevenDolphin73 , Iâll comment on it!
Iâve set dynamic: âstrictâ in the template of the logs index and I was able to keep the same mapping after doing the reindex
SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
Hi CostlyOstrich36 , I am not using Hydra, only OmegaConf, so you mean just calling OmegaConf.load should be enough?
I could delete the files manually with sudo rm (sudo is required, otherwise I get Permission Denied )
to pass secrets to each experiment