"5451Af93E0Bf68A4Ab09F654B222Ccae": { "1B790A3Da2E8D6Cd939Cf271694Fe81B": { "Metric": ":Monitor:Gpu", "Variant": "Gpu_0_Utilization", "Value": 0.0, "Min_Value": 0.0,

Answered

"5451af93e0bf68a4ab09f654b222ccae": { "1b790a3da2e8d6cd939cf271694fe81b": { "metric": ":monitor:gpu", "variant": "gpu_0_utilization", "value": 0.0, "min_value": 0.0, "max_value": 3.542 }, "409d4e6ad9b69b3224fceeac6e265ddc": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_used_gb", "value": 0.0, "min_value": 0.0, "max_value": 0.0 }, "74646afee0e0ab18d3cbd08ce1ff6aa3": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_usage", "value": 0.002, "min_value": 0.002, "max_value": 54.739 }, "abdb01e1de566d2165e902fe0839465e": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_free_gb", "value": 47.461, "min_value": 21.482, "max_value": 47.461 },Do we know if gpu_0_mem_usage and gpu_0_mem_used_gb, both shows current GPU usage?
How to know from this how much GPU is reserved for the task if this task is in progress?

(gpu_0_mem_used_gb)/(gpu_0_mem_used_gb+gpu_0_mem_free_gb) should give gpu memory % usage?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Votes Newest

Answers 7

Hi DrabCockroach54

Do we know if gpu_0_mem_usage and gpu_0_mem_used_gb, both shows current GPU usage?

the first is percentage used (memory % used at any specific moment) and the second is memory used GiB , both for the video memory

How to know from this how much GPU is reserved for the task if this task is in progress?

What do you mean by how much is reserved ? Are you running with an agent?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

. Can I get gpu usage over time frame via API also?

task.get_reported_scalarsBut this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We are running workers as bare metal and clearml-server on Kubernetes. I was trying to find, what are those min and max value for above metrics.

What do you mean by how much is reserved ? Are you running with an agent?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Thanks for the reply. If gpu_0_mem_usage is % of GPU memory in use, what is gpu_0_utilization ?

Is gpu_0_utilization also in % then?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Yeah exactly. Scalar tab have those but I need to add track in the alert if GPU utilization/gpu memory not in use and experiment in progress then alert. Can I get gpu usage over time frame via API also?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Is gpu_0_utilization also in % then?

Correct 🙂

I was trying to find, what are those min and max value for above metrics.

Oh that makes sense, notice that you can get the values over time, so you can track the usage over the experiment lifetime (you can of course see it in the Scalar tab of the experiment)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Write your answer

2K Views

7 Answers

3 years ago

2 years ago