My understanding may be bad. Say I have a single EC2 instance. Is that instance only able to handle one task at a time?
Or can I start multiple instances of the clearml-agent
process on it and then have one task per agent?
And if that's the case, can we have multiple agents on the EC2 instance listening to the same queue, e.g. default
. Or would this only work if they were listening to different queues?
@<1523701070390366208:profile|CostlyOstrich36> Any idea please? We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.
Yes, it's pretty lame that a clearml-agent
can only process one task at a time if it's not listening to a services
queue 🤔
I see. Is it possible for two agents to be utilizing the same GPU? (like if the machine has a terrific GPU, but only one of them?)
@<1541954607595393024:profile|BattyCrocodile47>
Is that instance only able to handle one task at a time?
You could have multiple agents on the same machine, each one with its own dedicated GPU, but you will not be able to change the allocation (i.e. now I want 2 GPUs on one agent) without restarting the agents on the instance. In either case, this is for a "bare-metal" machine, and in the AWS autoscaler case, this goes under "dynamic" GPUs (see above)
We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.
@<1546665634195050496:profile|SolidGoose91> I think that in order to have the flexibility there you need the "dynamic" GPU allocation that is only part of the "enterprise" offering 😞
That said, why not allocate a single a100 machine? no?
. Is it possible for two agents to be utilizing the same GPU?
It is, as long as memory wise they do not limit one another.
(If you are using k8s and clearml enterprise, then it supports GPU slicing and dynamic memory allocation)