Hi EmbarrassedSpider34 , what do you get in the log of the experiment you're trying to run? Or do you look at it at the level of the GCP console?
Thanks! I've asked this to the autoscaler devs and it might be a possible bug, you are the second one. He's checking and we'll come back to you!
My task runs just fine.
But no GPU.
(When it demands GPU it collapses).
Looking at the VM features on GCP UI it seems no GPU was defined for the VM.
It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM
Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n) `I am leaving the docker line empty, so I assume there's no docker spun up for my agent,
Hi EmbarrassedSpider34 , would you mind showing us a screenshot of your machine configuration? Can you check for any output logs that ClearML might have given you? Depending on the region, maybe there were no GPUs available, so could you maybe also check if you can manually spin up a GPU vm?
I don't think it's related to the region.
I do have the log of the autoscaler.
We also have an autoscaler that was implemented from scarch before ClearML had the autoscaler application.
I wouldn't want to share the autoscaler log with this channel.
Also, can you share which machine image you're using?