I might be wrong, but it seems like ClearML does not monitor GPU pressure when deploying a task to a worker rather rely only on its configured queues.
This is kind of accurate, the way the agent works is that you allocate a resource for the agent (specifically a GPU), then sets queues (plural) to listen to (by default priority ordered). Then each agent is individually pulling jobs and running on the allocated GPU.
If I understand you correctly, you want multiple agents on the same GPUs?
There is no limit on resources so you can have multiple agents "sharing" the same resource, but you have to make sure you are not launching two Tasks can run simultaneously.
Is it possible to configure the queues so that when a quad-GPU queue is being used for a task that other queues wait as the resource is busy (same goes for the dual-GPU queue)?
Actually this is fully supported, the sad news this is only supported in the paid tier 😞 . Usually this is kind of "enterprise" feature, for customers with DGX machines etc.
That said you can always move Tasks between jobs and manually stop them, which means that unless you have a huge load you can always switch manually, if that makes sense