Answered

Started Using The Integrated Gcp Autoscaler To Avoid Some Problems We Had. For Some Reason The Instances Doesn'T Have A Gpu Although Specifically Defined In The Ui. How Come? (Not Using Any Docker Container For The Agents)

Started using the integrated GCP autoscaler to avoid some problems we had.
For some reason the Instances doesn't have a GPU although specifically defined in the UI.
How come?
(Not using any docker container for the agents)

  				
Posted 
	2 years ago

					More  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Votes Newest

Answers 8

Thanks! I've asked this to the autoscaler devs and it might be a possible bug, you are the second one. He's checking and we'll come back to you!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

I don't think it's related to the region.
I do have the log of the autoscaler.
We also have an autoscaler that was implemented from scarch before ClearML had the autoscaler application.
I wouldn't want to share the autoscaler log with this channel.

  				
Posted 
	2 years ago

					More  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Hi EmbarrassedSpider34 , would you mind showing us a screenshot of your machine configuration? Can you check for any output logs that ClearML might have given you? Depending on the region, maybe there were no GPUs available, so could you maybe also check if you can manually spin up a GPU vm?

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Also, can you share which machine image you're using?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM

Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n) `I am leaving the docker line empty, so I assume there's no docker spun up for my agent,

  				
Posted 
	2 years ago

					More  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Do you use a default docker?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

My task runs just fine.
But no GPU.
(When it demands GPU it collapses).
Looking at the VM features on GCP UI it seems no GPU was defined for the VM.

  				
Posted 
	2 years ago

					More  		
  Report
		
					EmbarrassedSpider34
				
					0
					 × 1

Hi EmbarrassedSpider34 , what do you get in the log of the experiment you're trying to run? Or do you look at it at the level of the GCP console?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

2K Views

8 Answers

2 years ago

Answers 8

It's a private image (based off of this image).` ======================================Welcome to the Google Deep Learning VM

It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM