Can you try with a machine image with cuda (if you need) and docker pre-installed?
Thanks @<1523701070390366208:profile|CostlyOstrich36> , I was ultimately able to get it to run but had to build my own image, which included:
- The right version of python
- Docker
- CUDA + NVIDIA drivers + kernel header for NVIDIA install
- Virtual env
- The right GPU machine types, region with availability
Here is the log from the GCP VM for extra context
Is it possible the image you used doesn't have docker? Did you find any errors in the log?
Well, this Machine Image
( projects/debian-cloud/global/images/debian-10-buster-v20210721
) was the default value set by the WebUI (not something we specified). A ccording to the docs it should be an optional field? However, when I clear the field and try to re-launch I get this error:
googleapiclient.errors.HttpError: <HttpError 400 when requesting
returned "Invalid value for field 'resource.disks[0].initializeParams.sourceImage': ''. The URL is malformed.". Details: "[{'message': "Invalid value for field 'resource.disks[0].initializeParams.sourceImage': ''. The URL is malformed.", 'domain': 'global', 'reason': 'invalid'}]">
@<1523701070390366208:profile|CostlyOstrich36> , any thoughts on what might be happening with the autoscaler here?