we used to use pytorch and it worked just fine, but now we moved to pytorch-lightning (kind of extension on pytorch that gives keras-ish functionality)
Hello. Sorry for bringing up the thread. I am facing the same issue on clearml-agent version 1.4.1 and clearml version 1.8.0 . Can you please point me to a github issue FancyTurkey50 or any resolution CostlyOstrich36 ?
The issue has been resolved. Details in the same github issue https://github.com/allegroai/clearml/issues/635#issuecomment-1324870817
CostlyOstrich36 FancyTurkey50 in case this was still unresolved at your end.
I'm still getting the machine usage reports
Regular pytorch - you mean single GPU (I'm not familiar with torch distributed)?
Also just to give it a try, can you test with only 2 GPU's for example?
Cool, thanks for the info! I'll try to play with it as well 🙂
with regular pytorch it worked when running on all 8 gpus
Also, what if you try using only one GPU with pytorch-lightning? Still nothing is reported - i.e. console/scalars?
Hi FancyTurkey50 , how did you run the agent command?
FancyTurkey50 , could you open a github issue for this so we could follow it? I'm quite curious
And everything works fine with regular pytorch
Hi Natan,
agent command: clearml-agent daemon --gpu all
I'm using 8 gpus. the model runs on all of them, but the logging isn't working
I see. Just to simplify the issue - When using pytorch - were you getting machine usage reports (CPU/GPU usage)?
Also, I'm guessing various scalars aren't being reported. I'm guessing those were previously captured automatically by Clearml?
I have posted an update on a relevant issue - https://github.com/allegroai/clearml/issues/635
But no console or auto capturing of scalars?
Also, how many GPUs are you trying to run off?