Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Trying To Set Up The Gcp Autoscaler. I Created A Gcp Service Account And Granted It The Following Roles: Compute Admin, Service Account User, And Logs Writer. I Then Added Its Credentials Under The Gcp Credentials Section In Clearml. Vm Instanc

Hi,

I am trying to set up the GCP autoscaler. I created a GCP service account and granted it the following roles: Compute Admin, Service Account User, and Logs Writer. I then added its credentials under the GCP credentials section in ClearML.

VM instances are created successfully; however, they remain idle, do not pick up tasks from the default queue (which I specified in the configuration), and then shut down shortly afterward.

For the Docker base image, I am using gcr.io/deeplearning-platform-release/base-cu121:latest .
For the machine image, I am using: projects/ml-images/global/images/family/common-cu121-debian-11

The monitored queue is 'default', and I’ve set the Service Account Email to the correct email address of the service account with the mentioned roles. I have not defined an initialization script.

My additional ClearML configuration is: agent.extra_docker_arguments: ["--ipc=host", "--gpus", "all"]

Here is what happens when I launch teh app:
- The app instance is created.

- N number of instances are successfully created and launched on GCP.

- Several tasks are visible in the default queue.

However, after init, the VMs shut down almost right away and remain idle, with tasks still waiting in the queue. Log reports 0 active cloud instances.

Could you please advise on what might I be doing wrong? If you need any additional information about my configuration, I’d be happy to provide it.

Thanks in advance!

  
  
Posted 4 months ago
Votes Newest

Answers 3


Hi @<1826791494376230912:profile|CornyLobster42> , can you add logs from the VMs themselves? They should be saved on the Autoscaler

  
  
Posted 4 months ago

The image I sent was the output I got from my VMs. I eventually enabled the AutoScaler, successfully scheduling the tasks. However, I now have another issue: how can I use an image available in our GCP Artifact Registry?
I added the Artifact Registry Reader role to the service account used by the AutoScaler, and in the initialization script, I included Docker authentication ( gcloud auth configure-docker ) as well as a Docker pull command for the image from the Artifact Registry.
However, I am still constantly getting the following error message in the failed jobs console:

docker: Error response from daemon: pull access denied for <image_name>

  
  
Posted 4 months ago

image

  
  
Posted 4 months ago