BoredGoat1

1 Question, 17 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

17 × Eureka!

Questions 1
Answers 17

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Hi, I have a small issue about GPU monitoring. I run my training inside a Singularity container and I set the CUDA_VISIBLE_DEVICES variable. However, I get t...

clearml

5 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

I find the issue. In the code we must to add this condition
if self._active_gpus and i not in self._active_gpus: continueto be sure to not go in the for loop after. I propose to add this condition here: https://github.com/allegroai/trains/blob/e7864e6ba866a518ff07ab86da7c4703091fa94a/trains/utilities/resource_monitor.py#L302

5 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

The script works. I tested to check where in the cpde the issue comes from and in the function: _get_gpu_stats(self) , g.processes is empty or None. Moreover, in _last_process_pool I only have cpu and no gpu. I think the issue is because one of the gpu return None instead of empty array. The for loop crash and so no GPU is logged

5 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Hi AgitatedDove14 , I can run nvidia-smi inside the container. However, I have this warning message

5 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

In the for loop here. processes is empty or None in my case. None is for my display GPU

5 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Yes that's solved the issue. I will do the PR today

5 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

and I have root permissions