Unanswered
I Have A Problem That Might Not Directly Be Clearml Related, But Maybe Someone Here Has An Idea:
I Run A Clearml-Server On A Machine With 128Gb Ram, 32 Cores And 2 Gpus.
On The Same Machine I Run 2 Clearml-Agent Each With Access To 1 Gpu, 12 Cores, An 48G
128GB RAM, 32 cores and 2 GPUs.
WOW 😮 I'm so jealous
However, after a while my container will exit, but also the clearml-server stops responding correctly. WebUI will not show updates and only a few experiments are shown at all. After restarting the apiserver, the clearml-server works correctly again.
Do you get any errors on how/why the container exist? Which container is it?
171 Views
0
Answers
2 years ago
one year ago