Unanswered
Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:
Hi BoredGoat1
from this warning: " TRAINS Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
" It seems trains failed to load the nvidia .so dll that does the GPU monitoring:
This is based on pynvml, and I think it is trying to access "libnvidia-ml.so.1"
Basically saying, if you can run nvidima-smi from inside the container, it should work.
125 Views
0
Answers
4 years ago
one year ago