Unanswered
Hello, I Have Clearml Autoscaler Setup. Previously, When A New Task Comes Up, An Already Running Worker (If There Is), Will Take It, Apply The New Commit And Run The Task. Now, I Get An Error. So, I Can'T Run A Task On An Already Running Worker. It Has To
configurations:
extra_clearml_conf: 'sdk.aws.s3.region="us-west-2"
agent.extra_docker_arguments=["--shm-size=90g"]
agent.extra_docker_shell_script=["git config --global credential.helper cache --timeout=604800",]'
extra_trains_conf: ''
extra_vm_bash_script: ''
queues:
gcp-v100:
- - gcp-v100
- 4
gcp-l4:
- - gcp-l4
- 4
gcp-cpu:
- - gcp-cpu
- 4
resource_configurations:
gcp-v100:
disk_size: 300
instance_type: n1-highmem-8
source_image: projects/ml-images/global/images/c0-deeplearning-common-gpu-v20231105-debian-11-py310
accelerator_type: nvidia-tesla-v100
gcp-l4:
disk_size: 300
instance_type: g2-standard-12
source_image: projects/ml-images/global/images/c0-deeplearning-common-gpu-v20231105-debian-11-py310
accelerator_type: nvidia-l4
gcp-cpu:
disk_size: 4000
instance_type: c2-standard-4
source_image: projects/ml-images/global/images/c0-deeplearning-common-gpu-v20231105-debian-11-py310
cpu_only: True
hyper_params:
gcp_project: xxxxxxxxxxxxxxxxxx
region: 'us-central1'
zone: 'us-central1-a'
cloud_credentials_key: xxxxxxxxxxx
cloud_credentials_region: xxxxxxxxx
cloud_credentials_secret: xxxxxxxxxxxxxxxxxxxxxx
use_credentials_chain: false
cloud_provider: ''
default_docker_image: xxxxxxxxxxxxxxxxx
git_pass: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
git_user: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
max_idle_time_min: 30
max_spin_up_time_min: 30
polling_interval_time_min: 0.5
workers_prefix: 'gcp'
iam_arn: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
98 Views
0
Answers
9 months ago
9 months ago