Hello All, I Installed Self-Hosted Server And Queue(Cosumes 1 Gpu) On Kubernetes. I Have An Issue Regarding Gpu Monitoring. I Checked The Process Is Using Gpu In The Pod, But Gpu Usage Is Not Being Displayed On Workers & Queues Dashboard, Whereas Cpu Usag

Answered

Hello all,
I installed self-hosted server and queue(cosumes 1 gpu) on kubernetes.
I have an issue regarding gpu monitoring.
I checked the process is using gpu in the pod, but gpu usage is not being displayed on WORKERS & QUEUES Dashboard, whereas CPU usage is. what is wrong?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TartLeopard58
				
					0
					 × 1

Votes Newest

Answers 40

Hi again 😊 @<1523701087100473344:profile|SuccessfulKoala55> sure!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TartLeopard58
				
					0
					 × 1

is that the working scenario?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

here is the agent, task log file~!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TartLeopard58
				
					0
					 × 1

This was so we can see the agent log

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It looks like the log was truncated - can you please try to get all the log from running the agent in the cloud machine?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

on-premises using an agent?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I set CLEARML_AGENT_UPDATE_VERSION=1.5.3rc2 ` in agentk8sglue.basePodTemplate.env as i mentioned

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TartLeopard58
				
					0
					 × 1

it is working on on-premise machine(i can see gpu usage on WORKERS & QUEUES Dashboard). but it is not working on cloud pod

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TartLeopard58
				
					0
					 × 1

but it's running, isn't it?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> what is task log? you mean the pod log provisioned by clearml-agent? do you want me to show them?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TartLeopard58
				
					0
					 × 1

Show more results

Write your answer

176K Views

40 Answers

2 years ago