Apologies for the delay.
I have obfuscated the private information with XXX
. Let me know if you think any of it is relevant.
{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n \"type\": \"service_account\",\n \"project_id\": \"XXX\",\n \"private_key_id\": \"XXX\",\n \"private_key\": \"XXX\",\n \"client_id\": \"XXX\",\n \"auth_uri\": \"XXX\",\n \"token_uri\": \"XXX\",\n \"auth_provider_x509_cert_url\": \"XXX\",\n \"client_x509_cert_url\": \"XXX\",\n \"universe_domain\": \"XXX\"\n}","git_user":"XXX","git_pass":"XXX","default_docker_image":"XXX","instance_queue_list":[{"resource_name":"gcp-cpu-e2-highmem-4-ondemand","machine_type":"e2-highmem-4","cpu_only":true,"gpu_type":"nvidia-tesla-a100","gpu_count":0,"preemptible":false,"regular_instance_rollback":false,"regular_instance_rollback_timeout":10,"spot_instance_blackout_period":0,"num_instances":12,"queue_name":"gcp-cpu-e2-highmem-4-ondemand","source_image":"projects/deeplearning-platform-release/global/images/common-cpu-v20231105-ubuntu-2004-py310","disk_size_gb":100,"service_account_email":"default"},{"resource_name":"gcp-cpu-e2-medium-ondemand","machine_type":"e2-medium","cpu_only":true,"gpu_type":null,"gpu_count":0,"preemptible":false,"regular_instance_rollback":false,"regular_instance_rollback_timeout":10,"spot_instance_blackout_period":0,"num_instances":10,"queue_name":"gcp-cpu-e2-medium-ondemand","source_image":"projects/deeplearning-platform-release/global/images/common-cpu-v20231105-ubuntu-2004-py310","disk_size_gb":50,"service_account_email":"default"}],"name":"CPU Autoscaler","max_idle_time_min":60,"workers_prefix":"dynamic_gcp_cpu","polling_interval_time_min":"1","alert_on_multiple_workers_per_task":true,"exclude_bashrc":false,"custom_script":"XXX","extra_clearml_conf":"agent.extra_docker_arguments: [\"--ipc=host\", ]\n\nsdk.development.log_os_environments: [\"AWS_\"]\n\nagent.apply_environment: true\n\nenvironment {\n XXX\n XXX\n}\n\n\nsdk {\n aws {\n s3 {\n credentials: [\n {\n bucket: \"XXX\"\n key: \"XXX\"\n secret: \"XXX\"\n }\n ]\n }\n boto3 {\n pool_connections: 512\n max_multipart_concurrency: 16\n }\n }\n \n development {\n worker {\n report_event_flush_threshold: 1000\n }\n }\n}\n\nagent {\n default_docker: {\n arguments: [\"--shm-size\", \"12G\", \"-p\", \"5000:5000\"]\n }\n}"}
@<1529271085315395584:profile|AmusedCat74> ?
I see you have two resources defined there - can you simply click on the triple-dot icon on the autoscaler instance and choose "Export Configuration", than share it here? (please note to remove any credentials from the generated file)
Let me know if you need additional information.
@<1529271085315395584:profile|AmusedCat74> can you share the autoscaler configuration?
Hi @<1529271085315395584:profile|AmusedCat74> , thanks for reporting this, I'll ask the ClearML team to look into this