I found this issue but not sure if it's the same thing as it's from awhile back: None And the link I'm trying to clone doesn't start with https
but with git@bitbucket
Do you start the clearml agents on the server with the same user that has the credentials saved?
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?
Hi @<1545216070686609408:profile|EnthusiasticCow4> , can you include a more complete log of the failed task?
Thanks for always checking in @<1523701087100473344:profile|SuccessfulKoala55> 😛
Is it possible the cached repository was cloned before you changed your agent settings?
Also, did you set agent.enable_git_ask_pass: true
?
1707128614082 bigbrother:gpu0 INFO task 59d23c5919b04fd6947c1e463fa8c78c pulled from 9890a035b8f84872ab18d7ff207c26c6 by worker bigbrother:gpu0
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.vo_oc47r.cfg):
----------------------
agent.worker_id = bigbrother:gpu0
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.priority_packages.0 = hydra-core
agent.package_manager.priority_packages.1 = omegaconf
agent.venvs_dir = /home/xxxxxxxx/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/xxxxxxxx/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/xxxxxxxxx/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/xxxxxxxxx/.clearml/pip-cache
agent.docker_apt_cache = /home/xxxxxxxxxx/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.force_git_ssh_user = git
agent.default_python = 3.11
agent.cuda_version = 123
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key =
sdk.aws.s3.region = eu-west-1
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.web_server =
api.files_server =
api.credentials.access_key = xxxxxxxxxxxxxx
api.host =
environment.SOPS_AGE_KEY_FILE = ****
Executing task id [59d23c5919b04fd6947c1e463fa8c78c]:
repository = git@github.com:xxxxxxxxx/forecasting.git
branch = main
version_num = bcd14d745299f2fe91ba4b573de064d6b52ccdab
tag =
docker_cmd =
entry_point = src/train_process.py
working_dir = .
created virtual environment CPython3.10.12.final.0-64 in 101ms
creator CPython3Posix(dest=/home/xxxxxxxxx/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/xxxxxxxxs/.local/share/virtualenv)
added seed packages: pip==23.3.2, setuptools==69.0.3, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Using cached repository in "/home/xxxxxxxxxxx/.clearml/vcs-cache/forecasting.git.427114357fcbfbbb592480babeebfdaa/forecasting.git"
remote: Invalid username or password.
fatal: Authentication failed for '
'
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
1) Make sure you pushed the requested commit:
(repository='git@github.com:xxxxxxxxc/forecasting.git', branch='main', commit_id='bcd14d745299f2fe91ba4b573de064d6b52ccdab', tag='', docker_cmd=None, entry_point='src/train_process.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]
1707128621614 bigbrother:gpu0 DEBUG Process failed, exit code 1
Is it possible the cached repository was cloned before you changed your agent settings?
Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?
Also, did you set
agent.enable_git_ask_pass: true
?
The only instance of it in the config is commented out.
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves passing user/token to git submodules.
# this is a safer way to ensure multiple users using the same repository will
# not accidentally leak credentials
# Only supported on Linux systems, it will be the default in future releases
# enable_git_ask_pass: false
It looks like the default changed from False
to True
in the v1.7.0 update . I'll try rerunning it with that set to False
.
If that doesn't work I'll try clearing the cache
folder.
Yes, it does. And I can clone this repo and branch to the server outside of clearml just using git clone ....
as I've added ssh keys and authenticated.
Yes, I have a log in for the clearml server and I've set up an agent using a docker image. And I've added the ssh keys here. I then start the agent with clearml-agent daemon --queue direct-relief-forecasting --cpu-only -d
But, then when I try run the pipeline remotely, I'm getting the error above.
Hi there, I am getting this exact same error. I have set force_git_ssh_protocol: true
and have commented out the git_user
and git_pass
, but still getting the error. Is the only way to fix this to clear the cache? Do you have to do this every time then?
I think this error occurred for me because when I first authenticated with the project I was using username/password and later I transitioned to using ssh keys. That's why clearing the cache worked.
Did you validate that branch exists on remote?
I just cleared the cache, but still getting the error:
Python executable with version '3.10' requested by the Task, not found in path, using '/usr/local/bin/python3' (v3.9.9) instead
created virtual environment CPython3.9.9.final.0-64 in 661ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.9, clear=False, no_vcs_ignore=False, global=True)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==24.0, setuptools==69.1.0, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
cloning: git@bitbucket.org:pendulum-systems-inc/repo.git
ssh -oBatchMode=yes: 1: ssh: not found
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', 'git@bitbucket.org:pendulum-systems-inc/repo.git', '/root/.clearml/vcs-cache/repo.git.92e80863adb216af60f1b9a6015fe061/repo.git', '--recursive', '--quiet']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository.
1) Make sure you pushed the requested commit:
(repository='git@bitbucket.org:pendulum-systems-inc/repo.git', branch='feature/branch', commit_id='f359e2578b340edecc46baf2fbd183366e753e34', tag='', docker_cmd=image_name', entry_point='clearml_pipeline/pipeline.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]