Reputation
Badges 1
17 × Eureka!Yes that's solved the issue. I will do the PR today
TimelyPenguin76 didn't fix the issue.
I find the issue. In the code we must to add this conditionif self._active_gpus and i not in self._active_gpus: continue
to be sure to not go in the for loop after. I propose to add this condition here: https://github.com/allegroai/trains/blob/e7864e6ba866a518ff07ab86da7c4703091fa94a/trains/utilities/resource_monitor.py#L302
The script works. I tested to check where in the cpde the issue comes from and in the function: _get_gpu_stats(self)
, g.processes
is empty or None. Moreover, in _last_process_pool
I only have cpu and no gpu. I think the issue is because one of the gpu return None instead of empty array. The for loop crash and so no GPU is logged
I tested and I have no more warning messages
I have the lib in the container ( /.singularity.d/libs/
) FYI, my driver version is 418.67
Yes that is possible. I will try something to be sure
My second graphic card is only for display.
For my main GPU (for the training) it is empty array and for my other GPU it is empty
It is already in the variable :echo $LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
In the for loop here. processes is empty or None in my case. None is for my display GPU
Hi AgitatedDove14 , I can run nvidia-smi inside the container. However, I have this warning message