Trying to switch to a resources using gpu-enabled VMs failed with that same error above.
Looking at spawned VMs, they were spawned by the autoscaler without gpu even though I checked that my settings ( n1-standard-1
and nvidia-tesla-t4
and https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10?project=ml-tooling-test-external image for the VM) can be used to make vm instances and my gcp autoscaler configuration seems proper:[{"resource_name": "gpu_default3", "machine_type": "n1-standard-1", "cpu_only": false, "gpu_type": "nvidia-tesla-t4", "gpu_count": 1, "preemptible": false, "num_instances": 5, "queue_name": "default", "source_image": "projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10", "disk_size_gb": 100}, {"resource_name": "gpu_services3", "machine_type": "n1-standard-1", "cpu_only": false, "gpu_type": "nvidia-tesla-t4", "gpu_count": 1, "preemptible": false, "num_instances": 1, "queue_name": "services", "source_image": "projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10", "disk_size_gb": 100}]