CurvedOwl19

0 Questions, 4 Answers

Active since 24 November 2023

Last activity one year ago

Reputation

Answers 4

0 Is There A Way To Configure A Clearml-Agent So That It Shutdown The Server After It Has Been Idle For A Certain Time Period? We Are Using Gpu Resources From A Provider That Autoscaling Doesn'T Support (Such As Sagemaker Training Jobs).

Glad to see it works (thanks for sharing @<1632913939241111552:profile|HighRaccoon77> ).
I have a question on Dynamic GPU allocation , disregarding any autoscaling considerations:
Let’s say we spin up a clearML agent on an 8 GPU instance (via a launcher script as @<1632913939241111552:profile|HighRaccoon77> is doing), with --dynamic-gpus enabled, catering to 2 gpu queue and a 4 gpu queue. The agent pulls in a new t...

one year ago

If not, would the right workaround be to launch let’s say 3 different agents from the same launcher script, 2 of them with access to 2 GPU’s (agent1 - gpus 0,1, agent2-2,3), and the other with access to 4 GPU’s (agent3 - gpus 4,5,6,7)? Assuming I want to have more 2 GPU jobs running than 4 GPU jobs.

one year ago

@<1632913939241111552:profile|HighRaccoon77> were you able to make the instance stop after a job launched by the agent was complete?

one year ago

finally, is there any way of limiting the host memory that each task can use?

one year ago