Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Gcp Autoscaler Limits Not Working Correctly?

GCP AutoScaler limits not working correctly?

Hi there,

I have encountered some unexpected behaviour with the GCP Autoscaler.

The AutoScaler does not appear to be sticking to the limits which I enforced (having a maximum of 12 instances spun up at one time). Please see attached screenshot.

The spinning up of the instances was trigger by me adding 21 tasks to the queue gcp-cpu-e2-highmem-4-ondemand at the same time. I have added the relevant logging from this morning in a file.

Has anyone experienced anything similar to this happening in the past? Is there anyway I can prevent this on my side or is this a bug in the ClearML Autoscaler?

Cheers,
James
image

  
  
Posted 11 months ago
Votes Newest

Answers 7


Apologies for the delay.

I have obfuscated the private information with XXX . Let me know if you think any of it is relevant.

{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n  \"type\": \"service_account\",\n  \"project_id\": \"XXX\",\n  \"private_key_id\": \"XXX\",\n  \"private_key\": \"XXX\",\n  \"client_id\": \"XXX\",\n  \"auth_uri\": \"XXX\",\n  \"token_uri\": \"XXX\",\n  \"auth_provider_x509_cert_url\": \"XXX\",\n  \"client_x509_cert_url\": \"XXX\",\n  \"universe_domain\": \"XXX\"\n}","git_user":"XXX","git_pass":"XXX","default_docker_image":"XXX","instance_queue_list":[{"resource_name":"gcp-cpu-e2-highmem-4-ondemand","machine_type":"e2-highmem-4","cpu_only":true,"gpu_type":"nvidia-tesla-a100","gpu_count":0,"preemptible":false,"regular_instance_rollback":false,"regular_instance_rollback_timeout":10,"spot_instance_blackout_period":0,"num_instances":12,"queue_name":"gcp-cpu-e2-highmem-4-ondemand","source_image":"projects/deeplearning-platform-release/global/images/common-cpu-v20231105-ubuntu-2004-py310","disk_size_gb":100,"service_account_email":"default"},{"resource_name":"gcp-cpu-e2-medium-ondemand","machine_type":"e2-medium","cpu_only":true,"gpu_type":null,"gpu_count":0,"preemptible":false,"regular_instance_rollback":false,"regular_instance_rollback_timeout":10,"spot_instance_blackout_period":0,"num_instances":10,"queue_name":"gcp-cpu-e2-medium-ondemand","source_image":"projects/deeplearning-platform-release/global/images/common-cpu-v20231105-ubuntu-2004-py310","disk_size_gb":50,"service_account_email":"default"}],"name":"CPU Autoscaler","max_idle_time_min":60,"workers_prefix":"dynamic_gcp_cpu","polling_interval_time_min":"1","alert_on_multiple_workers_per_task":true,"exclude_bashrc":false,"custom_script":"XXX","extra_clearml_conf":"agent.extra_docker_arguments: [\"--ipc=host\", ]\n\nsdk.development.log_os_environments: [\"AWS_\"]\n\nagent.apply_environment: true\n\nenvironment {\n    XXX\n    XXX\n}\n\n\nsdk {\n    aws {\n        s3 {\n            credentials: [\n                {\n                    bucket: \"XXX\"\n                    key: \"XXX\"\n                    secret: \"XXX\"\n                }\n            ]\n        }\n        boto3 {\n            pool_connections: 512\n            max_multipart_concurrency: 16\n        }\n    }\n \n    development {\n        worker {\n            report_event_flush_threshold: 1000\n        }\n    }\n}\n\nagent {\n    default_docker: {\n        arguments: [\"--shm-size\", \"12G\", \"-p\", \"5000:5000\"]\n    }\n}"}
  
  
Posted 11 months ago

@<1529271085315395584:profile|AmusedCat74> ?

  
  
Posted 11 months ago

I see you have two resources defined there - can you simply click on the triple-dot icon on the autoscaler instance and choose "Export Configuration", than share it here? (please note to remove any credentials from the generated file)

  
  
Posted 11 months ago

Let me know if you need additional information.
image

  
  
Posted 11 months ago

@<1529271085315395584:profile|AmusedCat74> can you share the autoscaler configuration?

  
  
Posted 11 months ago

Cheers 👍

  
  
Posted 11 months ago

Hi @<1529271085315395584:profile|AmusedCat74> , thanks for reporting this, I'll ask the ClearML team to look into this

  
  
Posted 11 months ago
628 Views
7 Answers
11 months ago
11 months ago
Tags
Similar posts