Like, let's say I want "a 15GB GPU or better" and there's 4 queues, two of which fit the description... is there any way to set it so that ClearML will just queue it up on whichever one's available?
How do you know that? Also if you know that, what do you know about the queues ?
Generally speaking this type of granularity is k8s, but it has lots of caveats, specifically that you need to Know what you need in term of resources, that you can specify resources that do not exist, and that you can oversubscribe resources (i.e. starve processes)
The easiest way would be to rename a queue to "1xgpu 16gb", then make sure only machines with >16gb GPUs listen to it.
Note that an agent can listen to Multiple queues
OK, so if I've got, like, 2x16GB GPUs and 2x32GB I could allocate all the 16GB GPUs to one Queue? And all the 32GB ones to another?
Then when I queue up a job on the 1x16gb
queue it would run on one of the two GPUs?
OK, so if I've got, like, 2x16GB GPUs ...
You could do:clearml-agent daemon --queue "2xGPU_32gb" --gpus 0,1
Which will always use the two gpus for every Task it pulls
Or you could do:clearml-agent daemon --queue "1xGPU_16gb" --gpus 0 clearml-agent daemon --queue "1xGPU_16gb" --gpus 1
Which will have two agents, one per GPU (with 16gb per Task it runs)
Orclearml-agent daemon --queue "2xGPU_32gb" "1xGPU_16gb" --gpus 0,1
Which will first pull Tasks from the "2xGPU_32gb" queue and if this is empty, it will pull Tasks from "1xGPU_16gb". Notice that in both cases you will be using the two GPUs.
The paid tier includes dynamic-gpus support that allows the last example to actually allocate 1 or 2 gpus based on the queue the Task was pulled from.
Did that asnwer the question, or am I missing something ?
We do have the paid tier, I believe. Anywhere we can go and read up some more on this stuff, btw?
Good question 🙂
https://clear.ml/docs/latest/docs/clearml_agent#dynamic-gpu-allocation
The latest updated help will always be here as well 🙂clearml-agent daemon --help