Here is the log from the GCP VM for extra context
Well, this Machine Image
( projects/debian-cloud/global/images/debian-10-buster-v20210721
) was the default value set by the WebUI (not something we specified). A ccording to the docs it should be an optional field? However, when I clear the field and try to re-launch I get this error:
googleapiclient.errors.HttpError: <HttpError 400 when requesting
returned "Invalid value for field 'resource.disks[0].initializeParams.sourceImage': ''. The URL is malformed.". Details: "[{'message': "Invalid value for field 'resource.disks[0].initializeParams.sourceImage': ''. The URL is malformed.", 'domain': 'global', 'reason': 'invalid'}]">
Is it possible the image you used doesn't have docker? Did you find any errors in the log?
@<1523701070390366208:profile|CostlyOstrich36> , any thoughts on what might be happening with the autoscaler here?
Can you try with a machine image with cuda (if you need) and docker pre-installed?
Thanks @<1523701070390366208:profile|CostlyOstrich36> , I was ultimately able to get it to run but had to build my own image, which included:
- The right version of python
- Docker
- CUDA + NVIDIA drivers + kernel header for NVIDIA install
- Virtual env
- The right GPU machine types, region with availability