Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Gcp Autoscaler Limits Not Working Correctly?

GCP AutoScaler limits not working correctly?

Hi there,

I have encountered some unexpected behaviour with the GCP Autoscaler.

The AutoScaler does not appear to be sticking to the limits which I enforced (having a maximum of 12 instances spun up at one time). Please see attached screenshot.

The spinning up of the instances was trigger by me adding 21 tasks to the queue gcp-cpu-e2-highmem-4-ondemand at the same time. I have added the relevant logging from this morning in a file.

Has anyone experienced anything similar to this happening in the past? Is there anyway I can prevent this on my side or is this a bug in the ClearML Autoscaler?

Cheers,
James
image

  
  
Posted one year ago
Votes Newest

Answers 7


Cheers 👍

  
  
Posted one year ago

I see you have two resources defined there - can you simply click on the triple-dot icon on the autoscaler instance and choose "Export Configuration", than share it here? (please note to remove any credentials from the generated file)

  
  
Posted one year ago

Let me know if you need additional information.
image

  
  
Posted one year ago

Apologies for the delay.

I have obfuscated the private information with XXX . Let me know if you think any of it is relevant.

{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n  \"type\": \"service_account\",\n  \"project_id\": \"XXX\",\n  \"private_key_id\": \"XXX\",\n  \"private_key\": \"XXX\",\n  \"client_id\": \"XXX\",\n  \"auth_uri\": \"XXX\",\n  \"token_uri\": \"XXX\",\n  \"auth_provider_x509_cert_url\": \"XXX\",\n  \"client_x509_cert_url\": \"XXX\",\n  \"universe_domain\": \"XXX\"\n}","git_user":"XXX","git_pass":"XXX","default_docker_image":"XXX","instance_queue_list":[{"resource_name":"gcp-cpu-e2-highmem-4-ondemand","machine_type":"e2-highmem-4","cpu_only":true,"gpu_type":"nvidia-tesla-a100","gpu_count":0,"preemptible":false,"regular_instance_rollback":false,"regular_instance_rollback_timeout":10,"spot_instance_blackout_period":0,"num_instances":12,"queue_name":"gcp-cpu-e2-highmem-4-ondemand","source_image":"projects/deeplearning-platform-release/global/images/common-cpu-v20231105-ubuntu-2004-py310","disk_size_gb":100,"service_account_email":"default"},{"resource_name":"gcp-cpu-e2-medium-ondemand","machine_type":"e2-medium","cpu_only":true,"gpu_type":null,"gpu_count":0,"preemptible":false,"regular_instance_rollback":false,"regular_instance_rollback_timeout":10,"spot_instance_blackout_period":0,"num_instances":10,"queue_name":"gcp-cpu-e2-medium-ondemand","source_image":"projects/deeplearning-platform-release/global/images/common-cpu-v20231105-ubuntu-2004-py310","disk_size_gb":50,"service_account_email":"default"}],"name":"CPU Autoscaler","max_idle_time_min":60,"workers_prefix":"dynamic_gcp_cpu","polling_interval_time_min":"1","alert_on_multiple_workers_per_task":true,"exclude_bashrc":false,"custom_script":"XXX","extra_clearml_conf":"agent.extra_docker_arguments: [\"--ipc=host\", ]\n\nsdk.development.log_os_environments: [\"AWS_\"]\n\nagent.apply_environment: true\n\nenvironment {\n    XXX\n    XXX\n}\n\n\nsdk {\n    aws {\n        s3 {\n            credentials: [\n                {\n                    bucket: \"XXX\"\n                    key: \"XXX\"\n                    secret: \"XXX\"\n                }\n            ]\n        }\n        boto3 {\n            pool_connections: 512\n            max_multipart_concurrency: 16\n        }\n    }\n \n    development {\n        worker {\n            report_event_flush_threshold: 1000\n        }\n    }\n}\n\nagent {\n    default_docker: {\n        arguments: [\"--shm-size\", \"12G\", \"-p\", \"5000:5000\"]\n    }\n}"}
  
  
Posted one year ago

@<1529271085315395584:profile|AmusedCat74> ?

  
  
Posted one year ago

@<1529271085315395584:profile|AmusedCat74> can you share the autoscaler configuration?

  
  
Posted one year ago

Hi @<1529271085315395584:profile|AmusedCat74> , thanks for reporting this, I'll ask the ClearML team to look into this

  
  
Posted one year ago
701 Views
7 Answers
one year ago
one year ago
Tags
Similar posts