Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'M Trying To Use Clearml On Pytorch-Lightning With Multiple Gpus, But It Seems As If The Server Does Not Monitor The Experiment. I Can See No Progress In The Console (Steps Counter Stays On 0) Nor Any Tensorboard Loggings. On A Single Gpu Everything

Hi,
I'm trying to use clearml on pytorch-lightning with multiple gpus, but it seems as if the server does not monitor the experiment. I can see no progress in the console (steps counter stays on 0) nor any tensorboard loggings. on a single GPU everything works fine.

ubuntu 20.04
dgx (A100X8)
python 3.8.12
torch 1.10.0
pytorch-lightning 1.6.0
cleaml 1.3.0
clearml-agent 1.1.2

  
  
Posted one year ago
Votes Newest

Answers 21


Also, how many GPUs are you trying to run off?

  
  
Posted one year ago

But no console or auto capturing of scalars?

  
  
Posted one year ago

FancyTurkey50 , could you open a github issue for this so we could follow it? I'm quite curious

  
  
Posted one year ago

and it doesnt work for 2 gpus either

  
  
Posted one year ago

with one gpu it works fine

  
  
Posted one year ago

thanks!

  
  
Posted one year ago

Also, what if you try using only one GPU with pytorch-lightning? Still nothing is reported - i.e. console/scalars?

  
  
Posted one year ago

Cool, thanks for the info! I'll try to play with it as well 🙂

  
  
Posted one year ago

will try 2

  
  
Posted one year ago

Hello. Sorry for bringing up the thread. I am facing the same issue on clearml-agent version 1.4.1 and clearml version 1.8.0 . Can you please point me to a github issue FancyTurkey50 or any resolution CostlyOstrich36 ?

  
  
Posted 6 months ago

Regular pytorch - you mean single GPU (I'm not familiar with torch distributed)?
Also just to give it a try, can you test with only 2 GPU's for example?

  
  
Posted one year ago

Hi Natan,
agent command: clearml-agent daemon --gpu all
I'm using 8 gpus. the model runs on all of them, but the logging isn't working

  
  
Posted one year ago

Hi FancyTurkey50 , how did you run the agent command?

  
  
Posted one year ago

And everything works fine with regular pytorch

  
  
Posted one year ago

with regular pytorch it worked when running on all 8 gpus

  
  
Posted one year ago

I see. Just to simplify the issue - When using pytorch - were you getting machine usage reports (CPU/GPU usage)?
Also, I'm guessing various scalars aren't being reported. I'm guessing those were previously captured automatically by Clearml?

  
  
Posted one year ago

we used to use pytorch and it worked just fine, but now we moved to pytorch-lightning (kind of extension on pytorch that gives keras-ish functionality)

  
  
Posted one year ago

we used the pytorch with multi-gpu (ddp)

  
  
Posted one year ago

The issue has been resolved. Details in the same github issue https://github.com/allegroai/clearml/issues/635#issuecomment-1324870817
CostlyOstrich36 FancyTurkey50 in case this was still unresolved at your end.

  
  
Posted 6 months ago

I'm still getting the machine usage reports

  
  
Posted one year ago

I have posted an update on a relevant issue - https://github.com/allegroai/clearml/issues/635

  
  
Posted 6 months ago
138 Views
21 Answers
one year ago
4 months ago
Tags