Unanswered
Hi! I Have Some Clearml Agents On Gcp And Sometimes The Instance Seems To Reboot Making The Experiment Fail And All The Progress Is Lost. What Is The Best Way To Resume An Experiment?
Hey CostlyOstrich36 sorry to ping you! Let's say I enqueue multiple experiments on a couple of agents and one of them fails. Is it possible to restart the experiment from the UI using the latest checkpoint? What if the experiment gets assigned to the other agent? I am not sure how the continue_last_task
flag would help in this case.
175 Views
0
Answers
2 years ago
one year ago