Reputation
Badges 1
39 × Eureka!Still no good, managed to apply with errors only
Nope. It gives me errors.
Just like the guy that replied in the thread I linked in my previous reply here.
Adding the flags he added also didn't help
I got the same issue as well last night.
It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM
Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n) `I am leaving the docker line empty, so I assume there's no docker spun up for my agent,
My task runs just fine.
But no GPU.
(When it demands GPU it collapses).
Looking at the VM features on GCP UI it seems no GPU was defined for the VM.
Should note that it works when i run the container locally (with no external env variables).
CostlyOstrich36
Thank you,
Solved,
I messaged with Alon from your team and he will upload an update to the old repository.
Thanks,
solved.
I tried to delete ~/clearml.conf (apparently it was already exist)
and rerun clearml-init
the environment setting you added to your vault is only applied inside the instance when the agent starts running there, not as part of the command that starts the instance.
The most common DevOps practice for having these kind of variables in the init script but not completely exposed to the naked eye is by adding something like
export MY_ENV_VAR=$(echo '<base64-encoded secret>' | base64 --decode)
to the init script (編集済み)
I tried.
it looks like this,
sudo apt update
sudo apt install amazon-ecr-credential-helper
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ****
But my problem is that I can't even see whether it passes my init script properly (tried to add printing comment but I cannot see the output) anywhere (nor scaler, nor task)
You are right Idan,
I consulted our Private ClearML channel.
you cannot insert these environment variables any other place,
only in init script.
Here is the full quote:
Important to notice I am running my instances on GCP, but the container is on ECR (AWS)