Unanswered
Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:
Hi JitteryCoyote63 ,
The clearml-server asked the clearml-agent to stop the task because it didn’t got anything for a long time?
Seems so - there's a "non-responsive tasks" watchdog on the server in charge of doing exactly that. I assume you're using a self-hosted server?
90 Views
0
Answers
2 years ago
one year ago