Hello Everyone, I Am Having Issues With The Gcp Autoscaler. This Is In The Output Logs:

Answered

Hello everyone, I am having issues with the GCP Autoscaler.

This is in the output logs:

2023-11-17 11:18:19,156 - clearml.Auto-Scaler - ERROR - Found invalid resource configurations:
  - gcp4cpu: GCP does not support a non-preemptible E2 instance with a GPU

My compute resources look like the attached image.

As I understand, I have ticked 'Run in CPU mode' so GCP shouldn't be trying to create an instance with a GPU. I'm new to GCP so might be missing something obvious here!

Also, I would debug this issue myself but I can't find the GCP Autoscaler code anywhere. Is there an equivalent to aws_autoscaler.py which I'm missing?

Any help would be greatly appreciated!

Cheers,
James

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

Votes Newest

Answers 12

Here it is:

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

This is something you can do in the GCP console, one would imagine it can be done using their python library.

I think the limitation is that you can only pass a relative subnet path in the GCP Autoscaler console. Then, by the looks of the error message, the ClearML Autoscaler constructs the full path under the hood /project/<project_id>/subnet/<subnet_id> .

I'd like the option to specify the full path myself in the Autoscaler which would then allow me to use a shared subnet.

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

Thanks Jake. Do you know how I set the GPU count to 0?

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

AmusedCat74 the error seems to indicate you've selected a GPU count larger than 0 for that specific resource

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SuccessfulKoala55 Just following up as I figured out what was happening here and could be useful for the future.

The prefilled value for Number of GPUs in the GCP Autoscaler is 1 .

When one ticks Run in CPU mode (no gpus) it hides the GPU Type and Number of GPUs fields. However, the value which was these fields are still submitted in the API Request (I'm guessing here) when the Autoscaler is launched.

Hence, to get past this, you need to explicitly set Number of GPUs to 0 before ticking the Run in CPU mode (no gpus) which does not seem like the correct behaviour and is likely a bug.

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

Hi AmusedCat74 , can you please provide the full log of the autoscaler?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Hey AmusedCat74 , I may be wrong , but I think you can’t attach a gpu to an e2 instance , it should be at least an n1, no?

  				
Posted 
	one year ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

That makes sense, I'll add that as a future addition

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi AmusedCat74 , sorry I missed this, this looks like an obvious bug, I'll try to fix it for the next release

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I'm not sure this is supported in the Google machine spec

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

👍 Thanks for getting back to me.

Another issue I found was that I could only use vpc subnets from the google project I am launching the VMs in.

I cannot use shared vpc subnets from another project. This would be a useful feature to implement as GCP recommends segmenting the cloud estate so that the vpc and VMs are in different projects.

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

EnthusiasticShrimp49 How do I specify to not attach a gpu? I thought ticking 'Run in CPU Mode' would be sufficient. Is there something else I'm missing?

  				
Posted 
	one year ago

					More  		
  Report
		
					AmusedCat74
				
					0
					 × 1

Write your answer

1K Views

12 Answers

one year ago