Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
BoredGoat1
Moderator
1 Question, 17 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

17 × Eureka!
0 Votes
30 Answers
833 Views
0 Votes 30 Answers 833 Views
Hi, I have a small issue about GPU monitoring. I run my training inside a Singularity container and I set the CUDA_VISIBLE_DEVICES variable. However, I get t...
4 years ago
0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

It is already in the variable :
echo $LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs

4 years ago
0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

I find the issue. In the code we must to add this condition
if self._active_gpus and i not in self._active_gpus: continueto be sure to not go in the for loop after. I propose to add this condition here: https://github.com/allegroai/trains/blob/e7864e6ba866a518ff07ab86da7c4703091fa94a/trains/utilities/resource_monitor.py#L302

4 years ago
0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

The script works. I tested to check where in the cpde the issue comes from and in the function: _get_gpu_stats(self) , g.processes is empty or None. Moreover, in _last_process_pool I only have cpu and no gpu. I think the issue is because one of the gpu return None instead of empty array. The for loop crash and so no GPU is logged

4 years ago