. Many of these experiments appear with status running on clearml even though they have finish running,
Could it be their process just terminated? (i.e. not properly shutdown) ?
How are you running these multiple experiments?
BTW: if the server does not see any change in a Task for (I think the default is 2 hours) it will automatically mark these Task as aborted
each of those runs finished producing 10 plots each but in clearml only 1, a few, or none got uploaded
AttractiveCockroach17 could it be Hydra actually kills these processes?
(I'm trying to figure out if we can fix something with the hydra integration so that it marks them as aborted)
im running them with
python my_script.py -m my_parameter=value_1,value_2,value_3 (using hydra multirun)
it doesnt happen with all the tasks of the multirun as you can see in the photo
AttractiveCockroach17 can I assume you are working with the hydra local launcher ?
Okay, so I can't figure why it would "kill" the new experiments, I mean it should run them, but is there any "smart stopping" that causes it to kill he process before it ends ?
BTW: can this be reproduced with the clearml hydra example ?
indeed, im looking at their corresponding multirun outputs folder and the logs terminate before without error and the only plots saved are those in clearml. So as you say, it seems hydra kills these
So as you say, it seems hydra kills these
Hmm let me check in the code, maybe we can somehow hook into it
dont think will be reproducible with the hydra example. It was just that I launched like 50 jobs and some of them because of the parameters maybe failed (strangely with no error).
But is ok for now I guess, will debug wether those experiments that failed would failed if ran independently as well