Reputation
Badges 1
17 × Eureka!I find the issue. In the code we must to add this conditionif self._active_gpus and i not in self._active_gpus: continueto be sure to not go in the for loop after. I propose to add this condition here: https://github.com/allegroai/trains/blob/e7864e6ba866a518ff07ab86da7c4703091fa94a/trains/utilities/resource_monitor.py#L302
The script works. I tested to check where in the cpde the issue comes from and in the function: _get_gpu_stats(self) , g.processes is empty or None. Moreover, in _last_process_pool I only have cpu and no gpu. I think the issue is because one of the gpu return None instead of empty array. The for loop crash and so no GPU is logged
Hi AgitatedDove14 , I can run nvidia-smi inside the container. However, I have this warning message
In the for loop here. processes is empty or None in my case. None is for my display GPU
Yes that's solved the issue. I will do the PR today
It is already in the variable :echo $LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
I have the lib in the container ( /.singularity.d/libs/ ) FYI, my driver version is 418.67
My second graphic card is only for display.
Yes that is possible. I will try something to be sure
For my main GPU (for the training) it is empty array and for my other GPU it is empty
I tested and I have no more warning messages
TimelyPenguin76 didn't fix the issue.