Hi, What Could Be The Reason That A Task Ran On An Agent Just Stopped Updating? The Status Is Still "Running" But It Doesn'T Seems Like It. The Agent Is Running On A Docker On A Gpu. It Completed 92 Epochs And Started 93. Run Started At 18:37 Feb 27, Last

it was the only task @<1523701087100473344:profile|SuccessfulKoala55>
did you encounter something like this?
just a recap, task status was running, but seems to be stuck. nvidia-smi showed gpu still has memory allocated, ruling out the server web disconnecting from the agent and the agent finished. If someone did use the GPU outside clearML, i would expect some sort of CUDA crash in the agent's run

Posted 12 months ago
12 months ago
