Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M Running Into A Perplexing Issue. I Have Several Agents Running On A Workstation, I Also Am Directly Running Code From The Same Workstation. There Are Several Projects On The Workstation But One Of The Projects Is Struggling With Authentication To Git

I'm running into a perplexing issue.

I have several agents running on a workstation, I also am directly running code from the same workstation. There are several projects on the workstation but one of the projects is struggling with authentication to github when running from an_agent_only_ (but it works fine if I run it locally on the same system). I'm using ssh based authentication. The strange thing is that the other projects don't have any issues with auth and at some point, neither did this project. I thought it would be related to a commit ID that doesn't exist on remote but I triple checked that all commits are pushed to remote.

The relevant lines from the clearml.conf file:

    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
    # **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
    # To learn how to generate git token GitHub/Bitbucket/GitLab:
    # 

    # 

    # 

    # git_user: ""
    # git_pass: ""
    # Limit credentials to a single domain, for example: github.com,
    # all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
    # git_host: ""

    # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
    force_git_ssh_protocol: true
    # Force a specific SSH port when converting http to ssh links (the domain is kept the same)
    # force_git_ssh_port: 0
    # Force a specific SSH username when converting http to ssh links (the default username is 'git')
    force_git_ssh_user: git

The error I get:

Using cached repository in "/home/XXXXXXXXXX/.clearml/vcs-cache/forecasting.git.427114357fcbfbbb592480babeebfdaa/forecasting.git"
remote: Invalid username or password.
fatal: Authentication failed for '
'
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@github.com:XXXXXXXXXX/forecasting.git', branch='main', commit_id='bcd14d745299f2fe91ba4b573de064d6b52ccdab', tag='', docker_cmd=None, entry_point='src/train_process.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]

My confusion: it shouldn't be related the credentials on the system because that works with all of the other projects. It doesn't seem to be related to the commit ID because I pushed everything up and I tried it on another branch. It even works fine on the same server as long as I'm not using an agent (I'm running locally). Is it possible the clearml git cache is messing things up? I had a strange cache issue before.

  
  
Posted 9 months ago
Votes Newest

Answers 14


Yes, it does. And I can clone this repo and branch to the server outside of clearml just using git clone .... as I've added ssh keys and authenticated.

  
  
Posted 8 months ago

1707128614082 bigbrother:gpu0 INFO task 59d23c5919b04fd6947c1e463fa8c78c pulled from 9890a035b8f84872ab18d7ff207c26c6 by worker bigbrother:gpu0

Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.vo_oc47r.cfg):
----------------------
agent.worker_id = bigbrother:gpu0
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.priority_packages.0 = hydra-core
agent.package_manager.priority_packages.1 = omegaconf
agent.venvs_dir = /home/xxxxxxxx/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/xxxxxxxx/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/xxxxxxxxx/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/xxxxxxxxx/.clearml/pip-cache
agent.docker_apt_cache = /home/xxxxxxxxxx/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.force_git_ssh_user = git
agent.default_python = 3.11
agent.cuda_version = 123
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key = 
sdk.aws.s3.region = eu-west-1
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = 

api.web_server = 

api.files_server = 

api.credentials.access_key = xxxxxxxxxxxxxx
api.host = 

environment.SOPS_AGE_KEY_FILE = ****

Executing task id [59d23c5919b04fd6947c1e463fa8c78c]:
repository = git@github.com:xxxxxxxxx/forecasting.git
branch = main
version_num = bcd14d745299f2fe91ba4b573de064d6b52ccdab
tag = 
docker_cmd = 
entry_point = src/train_process.py
working_dir = .

created virtual environment CPython3.10.12.final.0-64 in 101ms
  creator CPython3Posix(dest=/home/xxxxxxxxx/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/xxxxxxxxs/.local/share/virtualenv)
    added seed packages: pip==23.3.2, setuptools==69.0.3, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator


Using cached repository in "/home/xxxxxxxxxxx/.clearml/vcs-cache/forecasting.git.427114357fcbfbbb592480babeebfdaa/forecasting.git"
remote: Invalid username or password.
fatal: Authentication failed for '
'
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.

clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@github.com:xxxxxxxxc/forecasting.git', branch='main', commit_id='bcd14d745299f2fe91ba4b573de064d6b52ccdab', tag='', docker_cmd=None, entry_point='src/train_process.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]


1707128621614 bigbrother:gpu0 DEBUG Process failed, exit code 1
  
  
Posted 9 months ago

Hi @<1545216070686609408:profile|EnthusiasticCow4> , can you include a more complete log of the failed task?

  
  
Posted 9 months ago

Thanks for always checking in @<1523701087100473344:profile|SuccessfulKoala55> 😛

  
  
Posted 9 months ago

Is it possible the cached repository was cloned before you changed your agent settings?

Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?

Also, did you set

agent.enable_git_ask_pass: true

?

The only instance of it in the config is commented out.

    # if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
    # it solves passing user/token to git submodules.
    # this is a safer way to ensure multiple users using the same repository will
    # not accidentally leak credentials
    # Only supported on Linux systems, it will be the default in future releases
    # enable_git_ask_pass: false

It looks like the default changed from False to True in the v1.7.0 update . I'll try rerunning it with that set to False .

If that doesn't work I'll try clearing the cache folder.

  
  
Posted 9 months ago

Results:

I first tried uncommenting enable_git_ask_pass: false but it didn't resolve the issue.

I then cleared the cache in the vcs-cache folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?

  
  
Posted 9 months ago

Do you start the clearml agents on the server with the same user that has the credentials saved?

  
  
Posted 8 months ago

Yes, I have a log in for the clearml server and I've set up an agent using a docker image. And I've added the ssh keys here. I then start the agent with clearml-agent daemon --queue direct-relief-forecasting --cpu-only -d But, then when I try run the pipeline remotely, I'm getting the error above.

  
  
Posted 8 months ago

Is it possible the cached repository was cloned before you changed your agent settings?

  
  
Posted 9 months ago

Also, did you set agent.enable_git_ask_pass: true ?

  
  
Posted 9 months ago

I think this error occurred for me because when I first authenticated with the project I was using username/password and later I transitioned to using ssh keys. That's why clearing the cache worked.

Did you validate that branch exists on remote?

  
  
Posted 8 months ago

I just cleared the cache, but still getting the error:

Python executable with version '3.10' requested by the Task, not found in path, using '/usr/local/bin/python3' (v3.9.9) instead
created virtual environment CPython3.9.9.final.0-64 in 661ms
  creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.9, clear=False, no_vcs_ignore=False, global=True)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==24.0, setuptools==69.1.0, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
cloning: git@bitbucket.org:pendulum-systems-inc/repo.git
ssh -oBatchMode=yes: 1: ssh: not found
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', 'git@bitbucket.org:pendulum-systems-inc/repo.git', '/root/.clearml/vcs-cache/repo.git.92e80863adb216af60f1b9a6015fe061/repo.git', '--recursive', '--quiet']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@bitbucket.org:pendulum-systems-inc/repo.git', branch='feature/branch', commit_id='f359e2578b340edecc46baf2fbd183366e753e34', tag='', docker_cmd=image_name', entry_point='clearml_pipeline/pipeline.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]
  
  
Posted 8 months ago

I found this issue but not sure if it's the same thing as it's from awhile back: None And the link I'm trying to clone doesn't start with https but with git@bitbucket

  
  
Posted 8 months ago

Hi there, I am getting this exact same error. I have set force_git_ssh_protocol: true and have commented out the git_user and git_pass , but still getting the error. Is the only way to fix this to clear the cache? Do you have to do this every time then?

  
  
Posted 8 months ago
670 Views
14 Answers
9 months ago
8 months ago
Tags
Similar posts