Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Trying To Figure Out How To Get Cloud Storage Access Working In The Agent. I Am Running The Agent Locally In Docker Mode. I Set Up Gcp Storage In The Clearml.Json But It Seems Not To Get Passed To The Agent. Also Tried To Add Agent.Extra_Docker_A

Hi, I am trying to figure out how to get cloud storage access working in the agent. I am running the agent locally in docker mode. I set up GCP Storage in the clearml.json but it seems not to get passed to the agent. Also tried to add agent.extra_docker_arguments to mount the service account is not helping. Any Ideas?

sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.s3.use_credentials_chain = false
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.aws.boto3.multipart_threshold = 8388608
sdk.aws.boto3.multipart_chunksize = 8388608
sdk.google.storage.project = sw-ai-220514
sdk.google.storage.credentials_json = /home/cboden/clearml_service.json
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
agent.worker_id = 
agent.worker_name = cboden-staige
agent.force_git_ssh_protocol = false
agent.python_binary = 
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/cboden/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/cboden/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/cboden/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/cboden/.clearml/pip-cache
agent.docker_apt_cache = /home/cboden/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.git_user = x-token-auth
agent.extra_docker.arguments = -v /home/cboden/clearml_service.json:/service.json -e GOOGLE_APPLICATION_CREDENTIALS\=/service.json
agent.default_python = 3.11
agent.cuda_version = 123
agent.cudnn_version = 0
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = 

api.web_server = 

api.files_server = 

api.credentials.access_key = V08TLWYCLG6VLD8XA3CD
api.host = 

Error:

024-02-22 17:01:47,780 - clearml.storage - ERROR - Failed creating storage object 
 Reason: Your default credentials were not found. To set up Application Default Credentials, see 
 for more information.
Traceback (most recent call last):
  File "bin/swtr_train_test.py", line 2, in <module>
    Task.init()
  File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/task.py", line 628, in init
    task.output_uri = task.get_project_object().default_output_destination
  File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/task.py", line 1205, in output_uri
    raise ValueError("Could not get access credentials for '{}' "
ValueError: Could not get access credentials for '
' , check configuration file ~/clearml.conf
  
  
Posted 10 months ago
Votes Newest

Answers 16


Also, please note that starting version 1.13.2, ClearML SDK supports directly decoding JSON from the credentials_json argument in case it fails loading it as a file, which means you don't need to actually mount any file

  
  
Posted 10 months ago

Hi @<1671689442621919232:profile|ItchyDuck87> , what is the exact setting that needs to be changed for this to wokr as far as the GCP spec for a new VM is concerned?

  
  
Posted 10 months ago

If you refer to the storage section, I did. But it is not very clear where google.storage should be added. Its obvious to add this in the sdk section. Not sure if I need to do more in the agent section. Please see my configuration above.
A working workaround is this: agent.extra_docker_arguments: ["-v","/home/cboden/clearml_service.json:/root/clearml_service.json","-e","GOOGLE_APPLICATION_CREDENTIALS=/root/clearml_service.json",]

  
  
Posted 10 months ago

Ok for GCP Auto Scaler it is even more complicated to get Google Cloud Storage Write Access. It seems that VMs are started with the default access scope . This means that the VM will only have read access to GCS but is unable to write. I think the only way to change this is on VM creation.

  
  
Posted 10 months ago

I tried starting a VM manually, same image and service account, installed clearml-agent manually and conncted it to my workspace. everything was working fine. I really need help as the GCP Auto Scaler is setting the wrong scope on VM creation:
image

  
  
Posted 10 months ago

Could not find the source code for the GCP autoscaler, but am very confident that this is the issue. Can you please help @channel

  
  
Posted 10 months ago

@<1673863823901069312:profile|BraveToad81>

  
  
Posted 9 months ago

Hi Jake, thank you for your response. Good to know that credentials_json supports direct decoding. This should be mentioned at the storage documentation .
For GCP Autoscaler, i think that the "Service Account Email" provided for each instance configuration should restrict access based on IAM rules. Right now the scope will not allow the user to add additional permissions to this service account.
i.e. If you select another "Service Account Email" than default, the VM creation should be done with full access scope like this:

gcloud compute instances create VM_NAME --service-account=SERVICE_ACCOUNT_EMAIL --scopes=

This way the SERVICE_ACCOUNT_EMAIL will have full control over IAM rules and this is also how Goolgle handles this if you use Cloud Console for VM creation: None
image

  
  
Posted 10 months ago

Or should I set agent.google.storage {}?

Did you follow the instructions in the docs?

  
  
Posted 10 months ago

Hi @<1671689442621919232:profile|ItchyDuck87> , did you manage to register directly via the SDK?

  
  
Posted 10 months ago

Yes if I run the experiment directly via sdk, the cloud access is working fine

  
  
Posted 10 months ago

Can you please help how to deal with this?

  
  
Posted 10 months ago

Error:

2024-02-26 09:11:43,799 - clearml.storage - ERROR - Failed uploading: 403 POST 
: {
  "error": {
    "code": 403,
    "message": "Access denied.",
    "errors": [
      {
        "message": "Access denied.",
        "domain": "global",
        "reason": "forbidden"
      }
    ]
  }
}

Same task with same credentials is working fine on local agent in docker mode but not with GCP Auto Scaler

  
  
Posted 10 months ago

Also, please bear in mind this is not always the use-case, if I understand correctly, you'd like any new instance to be able to read and write into your GCS buckets, however many people still want to maintain a separation and control read/write access to buckets using the ClearML SDK configuration (i.e. solely by the SDK)

  
  
Posted 10 months ago

Am I missing something or should it generally work this way? Or should I set agent.google.storage {}?

  
  
Posted 10 months ago

Now I tried to setup GCP Auto Scaler. No easy way to get Google Cloud Storage working with it. I think it would be good if the service account file gets be mounted automatically for agent in docker mode.
I really like ClearML and the dokumentation is good to get started, but I feel a lot of things was try and error if I want to do something more than the early basics. I still think it is a great tool but lacks on some detail in the documentations. Some examples:

  • How to add Google service account to agent docker mode
  • In the Keras examples. which Callbacks gets patchted and will actually do something? Is it only TensorBoard and ModelCheckpoint? The Magic is cool but developers need to know details.
  • On GCP Auto Scaler: What are the requirements for the VM image (docker, nvidia-container-tools,python3, pip)? The default image (debian-buster) is not working out of the box.
  
  
Posted 10 months ago
730 Views
16 Answers
10 months ago
9 months ago
Tags
Similar posts