Reputation
Badges 1
17 × Eureka!Yes that's solved the issue. I will do the PR today
For my main GPU (for the training) it is empty array and for my other GPU it is empty
It is already in the variable :echo $LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
I tested and I have no more warning messages
I have the lib in the container ( /.singularity.d/libs/
) FYI, my driver version is 418.67
Hi AgitatedDove14 , I can run nvidia-smi inside the container. However, I have this warning message
TimelyPenguin76 didn't fix the issue.
The script works. I tested to check where in the cpde the issue comes from and in the function: _get_gpu_stats(self)
, g.processes
is empty or None. Moreover, in _last_process_pool
I only have cpu and no gpu. I think the issue is because one of the gpu return None instead of empty array. The for loop crash and so no GPU is logged
My second graphic card is only for display.
Yes that is possible. I will try something to be sure
In the for loop here. processes is empty or None in my case. None is for my display GPU
I find the issue. In the code we must to add this conditionif self._active_gpus and i not in self._active_gpus: continue
to be sure to not go in the for loop after. I propose to add this condition here: https://github.com/allegroai/trains/blob/e7864e6ba866a518ff07ab86da7c4703091fa94a/trains/utilities/resource_monitor.py#L302