I guess I just have to make sure that total memory usage of all parallel processes are not higher than my gpu's memory.
Yep, unfortunately I'm not aware of any way to do that automatically 🙂
Hi CooperativeFly2
is it possible to create multiple train-agent per gpu
Yes you can, that said memory cannot be actually shared between GPU processes (GPU time is obviously shared) so you have to be careful with the Tasks actually being executed in parallel.
For instance:TRAINS_WORKER_NAME=host_a trains-agent daemon --gpus 0 --queue default TRAINS_WORKER_NAME=host_b trains-agent daemon --gpus 0 --queue default
Thanks I've tried this out and this seems to work. I guess I just have to make sure that total memory usage of all parallel processes are not higher than my gpu's memory.