Unanswered
Hi All, We Have Clearml-Server Running On A Kube Pod, And Then A Gpu Server Running The Clearml-Agent Which We Use To Queue Jobs.
For Some Reason, Our Kube Pod Restarted (We'Re Looking Into Why), But In The Process Of This Happening All Jobs On The Worke
For reference, the clearml agent is running in its own user profile in Ubuntu 24.04 (so that it doesn't run as root as per previous discussions)
6 Views
0
Answers
9 days ago
8 days ago