No, i have one queue, with one server in that queue. This server have 2 gpus. Using my training code i can choose if i use 2 gpus or just one in the training....
Can you attach the console log? What GPUs are you using? I assume nvidia-smi runs without issue?
I see. Can you provide a simple stand alone code snippet that reproduces this behaviour for you?
Clearml-agent on worker: 1.9.2
Clearml on my computer: 1.16.4
2 gpus - NVIDIA GeForce RTX 3080
I reffer to only the training statistic, in the scalars tab. I can see the monitoring of the gpu's and cpu, memory...
Also i will say again then with only one gpu in the training everything is working great.
Yes no issue with nvidia-smi, it recognize the 2 gpu's and successfuly use them for the training. The only problem is the metrics, scalars in the ui when i use 2 gpus
You have two queues, one for 1xGPU and the other for 2xGPU, two workers on the GPU machine are running with each listening to the relevant queue. Is that the setup?
Even when i dont add the --gpus flag, it doesnt work.
Just to make sure we're on the same page, you're referring the machine statistics or ALL scalars don't show up?
Hi @<1774969995759980544:profile|SmoggyGoose12> , I think that selecting GPUs works only in docker mode.
I use in clearml-agent the --gpus 0,1 flag.
And also i dont use docker mode. I use virtual env mode.
What versions of clearml-agent
& clearml
are you using? Is it a self hosted server?
Also, what GPUs are you running on that machine?
What is the command you used to run the agent?