Also, how many GPUs are you trying to run off?
But no console or auto capturing of scalars?
FancyTurkey50 , could you open a github issue for this so we could follow it? I'm quite curious
and it doesnt work for 2 gpus either
with one gpu it works fine
Also, what if you try using only one GPU with pytorch-lightning? Still nothing is reported - i.e. console/scalars?
Cool, thanks for the info! I'll try to play with it as well 🙂
will try 2
Hello. Sorry for bringing up the thread. I am facing the same issue on clearml-agent version 1.4.1 and clearml version 1.8.0 . Can you please point me to a github issue FancyTurkey50 or any resolution CostlyOstrich36 ?
Regular pytorch - you mean single GPU (I'm not familiar with torch distributed)?
Also just to give it a try, can you test with only 2 GPU's for example?
clearml-agent daemon --gpu all
I'm using 8 gpus. the model runs on all of them, but the logging isn't working
Hi FancyTurkey50 , how did you run the agent command?
And everything works fine with regular pytorch
with regular pytorch it worked when running on all 8 gpus
I see. Just to simplify the issue - When using pytorch - were you getting machine usage reports (CPU/GPU usage)?
Also, I'm guessing various scalars aren't being reported. I'm guessing those were previously captured automatically by Clearml?
we used to use pytorch and it worked just fine, but now we moved to pytorch-lightning (kind of extension on pytorch that gives keras-ish functionality)
we used the pytorch with multi-gpu (ddp)
The issue has been resolved. Details in the same github issue https://github.com/allegroai/clearml/issues/635#issuecomment-1324870817
CostlyOstrich36 FancyTurkey50 in case this was still unresolved at your end.
I'm still getting the machine usage reports
I have posted an update on a relevant issue - https://github.com/allegroai/clearml/issues/635