Yes, it does. And I can clone this repo and branch to the server outside of clearml just using git clone ....
as I've added ssh keys and authenticated.
1707128614082 bigbrother:gpu0 INFO task 59d23c5919b04fd6947c1e463fa8c78c pulled from 9890a035b8f84872ab18d7ff207c26c6 by worker bigbrother:gpu0
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.vo_oc47r.cfg):
----------------------
agent.worker_id = bigbrother:gpu0
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.priority_packages.0 = hydra-core
agent.package_manager.priority_packages.1 = omegaconf
agent.venvs_dir = /home/xxxxxxxx/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/xxxxxxxx/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/xxxxxxxxx/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/xxxxxxxxx/.clearml/pip-cache
agent.docker_apt_cache = /home/xxxxxxxxxx/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.force_git_ssh_user = git
agent.default_python = 3.11
agent.cuda_version = 123
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key =
sdk.aws.s3.region = eu-west-1
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.web_server =
api.files_server =
api.credentials.access_key = xxxxxxxxxxxxxx
api.host =
environment.SOPS_AGE_KEY_FILE = ****
Executing task id [59d23c5919b04fd6947c1e463fa8c78c]:
repository = git@github.com:xxxxxxxxx/forecasting.git
branch = main
version_num = bcd14d745299f2fe91ba4b573de064d6b52ccdab
tag =
docker_cmd =
entry_point = src/train_process.py
working_dir = .
created virtual environment CPython3.10.12.final.0-64 in 101ms
creator CPython3Posix(dest=/home/xxxxxxxxx/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/xxxxxxxxs/.local/share/virtualenv)
added seed packages: pip==23.3.2, setuptools==69.0.3, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Using cached repository in "/home/xxxxxxxxxxx/.clearml/vcs-cache/forecasting.git.427114357fcbfbbb592480babeebfdaa/forecasting.git"
remote: Invalid username or password.
fatal: Authentication failed for '
'
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
1) Make sure you pushed the requested commit:
(repository='git@github.com:xxxxxxxxc/forecasting.git', branch='main', commit_id='bcd14d745299f2fe91ba4b573de064d6b52ccdab', tag='', docker_cmd=None, entry_point='src/train_process.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]
1707128621614 bigbrother:gpu0 DEBUG Process failed, exit code 1
Hi @<1545216070686609408:profile|EnthusiasticCow4> , can you include a more complete log of the failed task?
Thanks for always checking in @<1523701087100473344:profile|SuccessfulKoala55> 😛
Is it possible the cached repository was cloned before you changed your agent settings?
Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?
Also, did you set
agent.enable_git_ask_pass: true
?
The only instance of it in the config is commented out.
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves passing user/token to git submodules.
# this is a safer way to ensure multiple users using the same repository will
# not accidentally leak credentials
# Only supported on Linux systems, it will be the default in future releases
# enable_git_ask_pass: false
It looks like the default changed from False
to True
in the v1.7.0 update . I'll try rerunning it with that set to False
.
If that doesn't work I'll try clearing the cache
folder.
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?
Do you start the clearml agents on the server with the same user that has the credentials saved?
Yes, I have a log in for the clearml server and I've set up an agent using a docker image. And I've added the ssh keys here. I then start the agent with clearml-agent daemon --queue direct-relief-forecasting --cpu-only -d
But, then when I try run the pipeline remotely, I'm getting the error above.
Is it possible the cached repository was cloned before you changed your agent settings?
Also, did you set agent.enable_git_ask_pass: true
?
I think this error occurred for me because when I first authenticated with the project I was using username/password and later I transitioned to using ssh keys. That's why clearing the cache worked.
Did you validate that branch exists on remote?
I just cleared the cache, but still getting the error:
Python executable with version '3.10' requested by the Task, not found in path, using '/usr/local/bin/python3' (v3.9.9) instead
created virtual environment CPython3.9.9.final.0-64 in 661ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.9, clear=False, no_vcs_ignore=False, global=True)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==24.0, setuptools==69.1.0, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
cloning: git@bitbucket.org:pendulum-systems-inc/repo.git
ssh -oBatchMode=yes: 1: ssh: not found
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', 'git@bitbucket.org:pendulum-systems-inc/repo.git', '/root/.clearml/vcs-cache/repo.git.92e80863adb216af60f1b9a6015fe061/repo.git', '--recursive', '--quiet']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository.
1) Make sure you pushed the requested commit:
(repository='git@bitbucket.org:pendulum-systems-inc/repo.git', branch='feature/branch', commit_id='f359e2578b340edecc46baf2fbd183366e753e34', tag='', docker_cmd=image_name', entry_point='clearml_pipeline/pipeline.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]
I found this issue but not sure if it's the same thing as it's from awhile back: None And the link I'm trying to clone doesn't start with https
but with git@bitbucket
Hi there, I am getting this exact same error. I have set force_git_ssh_protocol: true
and have commented out the git_user
and git_pass
, but still getting the error. Is the only way to fix this to clear the cache? Do you have to do this every time then?