Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'Ve Run Into A Problem And Would Appreciate Some Help. I Installed Clearml Locally. When I Run A New Task On A Remote Server And In The Python Training Code I Set It To Only Train On One Gpu. Everything Works Fine And I See All The Scalars Automatica

Hi, I've run into a problem and would appreciate some help. I installed clearml locally. When I run a new task on a remote server and in the Python training code I set it to only train on one GPU. Everything works fine and I see all the scalars automatically in the clearml web ui. But as soon as I change the training to run on 2 GPUs on my server, the training works but no scalars appear in the webui.

  
  
Posted one day ago
Votes Newest

Answers 13


No, i have one queue, with one server in that queue. This server have 2 gpus. Using my training code i can choose if i use 2 gpus or just one in the training....

  
  
Posted one day ago

Can you attach the console log? What GPUs are you using? I assume nvidia-smi runs without issue?

  
  
Posted one day ago

I see. Can you provide a simple stand alone code snippet that reproduces this behaviour for you?

  
  
Posted one day ago

Clearml-agent on worker: 1.9.2
Clearml on my computer: 1.16.4

2 gpus - NVIDIA GeForce RTX 3080

I reffer to only the training statistic, in the scalars tab. I can see the monitoring of the gpu's and cpu, memory...

Also i will say again then with only one gpu in the training everything is working great.

  
  
Posted one day ago

Yes no issue with nvidia-smi, it recognize the 2 gpu's and successfuly use them for the training. The only problem is the metrics, scalars in the ui when i use 2 gpus

  
  
Posted one day ago

You have two queues, one for 1xGPU and the other for 2xGPU, two workers on the GPU machine are running with each listening to the relevant queue. Is that the setup?

  
  
Posted one day ago

Even when i dont add the --gpus flag, it doesnt work.

  
  
Posted one day ago

Just to make sure we're on the same page, you're referring the machine statistics or ALL scalars don't show up?

  
  
Posted one day ago

Hi @<1774969995759980544:profile|SmoggyGoose12> , I think that selecting GPUs works only in docker mode.

  
  
Posted one day ago

I use in clearml-agent the --gpus 0,1 flag.
And also i dont use docker mode. I use virtual env mode.

  
  
Posted one day ago

What versions of clearml-agent & clearml are you using? Is it a self hosted server?

  
  
Posted one day ago

Also, what GPUs are you running on that machine?

  
  
Posted one day ago

What is the command you used to run the agent?

  
  
Posted one day ago