Hi Jake, thank you for your response. Good to know that credentials_json
supports direct decoding. This should be mentioned at the storage documentation .
For GCP Autoscaler, i think that the "Service Account Email" provided for each instance configuration should restrict access based on IAM rules. Right now the scope will not allow the user to add additional permissions to this service account.
i.e. If you select another "Service Account Email" than default, the VM creation should be done with full access scope like this:
gcloud compute instances create VM_NAME --service-account=SERVICE_ACCOUNT_EMAIL --scopes=
This way the SERVICE_ACCOUNT_EMAIL will have full control over IAM rules and this is also how Goolgle handles this if you use Cloud Console for VM creation: None
Also, please bear in mind this is not always the use-case, if I understand correctly, you'd like any new instance to be able to read and write into your GCS buckets, however many people still want to maintain a separation and control read/write access to buckets using the ClearML SDK configuration (i.e. solely by the SDK)
Hi @<1671689442621919232:profile|ItchyDuck87> , did you manage to register directly via the SDK?
Could not find the source code for the GCP autoscaler, but am very confident that this is the issue. Can you please help @channel
Or should I set agent.google.storage {}?
Did you follow the instructions in the docs?
Now I tried to setup GCP Auto Scaler. No easy way to get Google Cloud Storage working with it. I think it would be good if the service account file gets be mounted automatically for agent in docker mode.
I really like ClearML and the dokumentation is good to get started, but I feel a lot of things was try and error if I want to do something more than the early basics. I still think it is a great tool but lacks on some detail in the documentations. Some examples:
- How to add Google service account to agent docker mode
- In the Keras examples. which Callbacks gets patchted and will actually do something? Is it only TensorBoard and ModelCheckpoint? The Magic is cool but developers need to know details.
- On GCP Auto Scaler: What are the requirements for the VM image (docker, nvidia-container-tools,python3, pip)? The default image (debian-buster) is not working out of the box.
If you refer to the storage section, I did. But it is not very clear where google.storage
should be added. Its obvious to add this in the sdk section. Not sure if I need to do more in the agent section. Please see my configuration above.
A working workaround is this: agent.extra_docker_arguments: ["-v","/home/cboden/clearml_service.json:/root/clearml_service.json","-e","GOOGLE_APPLICATION_CREDENTIALS=/root/clearml_service.json",]
Error:
2024-02-26 09:11:43,799 - clearml.storage - ERROR - Failed uploading: 403 POST
: {
"error": {
"code": 403,
"message": "Access denied.",
"errors": [
{
"message": "Access denied.",
"domain": "global",
"reason": "forbidden"
}
]
}
}
Same task with same credentials is working fine on local agent in docker mode but not with GCP Auto Scaler
I tried starting a VM manually, same image and service account, installed clearml-agent manually and conncted it to my workspace. everything was working fine. I really need help as the GCP Auto Scaler is setting the wrong scope on VM creation:
Also, please note that starting version 1.13.2, ClearML SDK supports directly decoding JSON from the credentials_json
argument in case it fails loading it as a file, which means you don't need to actually mount any file
Am I missing something or should it generally work this way? Or should I set agent.google.storage {}?
Ok for GCP Auto Scaler it is even more complicated to get Google Cloud Storage Write Access. It seems that VMs are started with the default access scope . This means that the VM will only have read access to GCS but is unable to write. I think the only way to change this is on VM creation.
Yes if I run the experiment directly via sdk, the cloud access is working fine
Hi @<1671689442621919232:profile|ItchyDuck87> , what is the exact setting that needs to be changed for this to wokr as far as the GCP spec for a new VM is concerned?