Answered

Started Using The Integrated Gcp Autoscaler To Avoid Some Problems We Had. For Some Reason The Instances Doesn'T Have A Gpu Although Specifically Defined In The Ui. How Come? (Not Using Any Docker Container For The Agents)

Started using the integrated GCP autoscaler to avoid some problems we had.
For some reason the Instances doesn't have a GPU although specifically defined in the UI.
How come?
(Not using any docker container for the agents)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Votes Newest

Answers 8

My task runs just fine.
But no GPU.
(When it demands GPU it collapses).
Looking at the VM features on GCP UI it seems no GPU was defined for the VM.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Also, can you share which machine image you're using?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I don't think it's related to the region.
I do have the log of the autoscaler.
We also have an autoscaler that was implemented from scarch before ClearML had the autoscaler application.
I wouldn't want to share the autoscaler log with this channel.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Hi EmbarrassedSpider34 , what do you get in the log of the experiment you're trying to run? Or do you look at it at the level of the GCP console?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Thanks! I've asked this to the autoscaler devs and it might be a possible bug, you are the second one. He's checking and we'll come back to you!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Hi EmbarrassedSpider34 , would you mind showing us a screenshot of your machine configuration? Can you check for any output logs that ClearML might have given you? Depending on the region, maybe there were no GPUs available, so could you maybe also check if you can manually spin up a GPU vm?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Do you use a default docker?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM

Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n) `I am leaving the docker line empty, so I assume there's no docker spun up for my agent,

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Write your answer

3K Views

8 Answers

3 years ago

2 years ago

Answers 8

It's a private image (based off of this image).` ======================================Welcome to the Google Deep Learning VM

It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM