Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, Details: Both Projects Are Using Clearml V1.14.2Rc0 (But I'Ve Tested It With Other Versions). I'M Using The Web-App, So We'Re Not Hosting Our Own Cleaml-Server We Do Have A Server With Several Cleaml-Agents V1.7.0 I'M Running Into A Seemingly Co

Hi All,

Details:
Both projects are using clearml v1.14.2rc0 (but I've tested it with other versions).
I'm using the web-app, so we're not hosting our own cleaml-server
We do have a server with several cleaml-agents v1.7.0

I'm running into a seemingly contradictory issue. I have two projects, neither currently use docker containers. Everything, locally and remote, is being run on the same server (even the agents). When I try and enqueue a task from one project, it can't clone the repo due to credential issues. When I enqueue a task from the other project, it clones the repo without issue. What's odd, I didn't have issue with the project previously. I used to run things remotely often and there was no issue. Nothing in the repo has anything to do with the git credentials. I'm using ssh based auth and the ssh credentials are in the .ssh folder on my user, the same user the agents are spun up from.

Task 1: When I enqueue the task it errors, telling me it can't clone the repo due to credential issues:

2024-01-22 17:32:59
task e44d99265f654773aa636558188de7c7 pulled from 8a69a982f5824762aeac7b000fbf2161 by worker bigbrother:9
2024-01-22 17:33:06
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.0av8t0fq.cfg):
----------------------
agent.worker_id = bigbrother:9
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.priority_packages.0 = hydra-core
agent.package_manager.priority_packages.1 = omegaconf
agent.venvs_dir = /home/natephysics/.clearml/venvs-builds.9
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/natephysics/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/natephysics/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/natephysics/.clearml/pip-cache
agent.docker_apt_cache = /home/natephysics/.clearml/apt-cache.9
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.default_python = 3.11
agent.cuda_version = 123
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key = 
sdk.aws.s3.region = eu-west-1
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = 

api.web_server = 

api.files_server = 

api.credentials.access_key = S8T2YH1QWZCYNT1KNWP7
api.host = 

environment.SOPS_AGE_KEY_FILE = ****
Executing task id [b07058e79a2843a2b89dc78535529309]:
repository = git@github.com:TicketSwap/LTV.git
branch = DAT-2089-adapt-ltv-to-produce-both-buyer-and-seller-results
version_num = e292c1f050eaaddf0462fe29c0a9003d811909d6
tag = 
docker_cmd = 
entry_point = src/etl_process.py
working_dir = .
created virtual environment CPython3.10.12.final.0-64 in 187ms
  creator CPython3Posix(dest=/home/natephysics/.clearml/venvs-builds.5/3.10, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/natephysics/.local/share/virtualenv)
    added seed packages: pip==23.3.2, setuptools==69.0.2, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Using cached repository in "/home/natephysics/.clearml/vcs-cache/LTV.git.9f6cee68999631fb361e67ce3bfd19bb/LTV.git"
remote: Invalid username or password.
fatal: Authentication failed for '
'
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@github.com:TicketSwap/LTV.git', branch='DAT-2089-adapt-ltv-to-produce-both-buyer-and-seller-results', commit_id='e292c1f050eaaddf0462fe29c0a9003d811909d6', tag='', docker_cmd=None, entry_point='src/etl_process.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]
2024-01-22 15:28:42
Process failed, exit code 1
  
  
Posted 11 months ago
Votes Newest

Answers 6


Looks good 😄

  
  
Posted 11 months ago

Hi Jake 👍 ,

Maybe the content is cached? The repo isn't big. I didn't realize the log was missing content. I believe I copied everything but I'll double check in a moment.

  
  
Posted 11 months ago

@<1545216070686609408:profile|EnthusiasticCow4> this second log seems to be missing quite a lot - agent configuration abruptly ends (missing files_server etc.) and no repository cloning included - how is it successful?

  
  
Posted 11 months ago

I'm not sure why the logs were incomplete. I think part of the reason it wasn't pulling from the repo was that it was pulling from cache. I cleared the clearml cache for that project and reran it. This should be the full log.

  
  
Posted 11 months ago

Project 2:

2024-01-22 17:21:56
task 6518c3cd13394aa4abbc8f0dc34eb763 pulled from 8a69a982f5824762aeac7b000fbf2161 by worker bigbrother:10
2024-01-22 17:22:03
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.bojpliyx.cfg):
----------------------
agent.worker_id = bigbrother:10
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.priority_packages.0 = hydra-core
agent.package_manager.priority_packages.1 = omegaconf
agent.venvs_dir = /home/natephysics/.clearml/venvs-builds.10
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/natephysics/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/natephysics/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/natephysics/.clearml/pip-cache
agent.docker_apt_cache = /home/natephysics/.clearml/apt-cache.10
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.default_python = 3.11
agent.cuda_version = 123
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key = 
sdk.aws.s3.region = eu-west-1
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = 

api.web_server = 

2024-01-22 17:22:19
Process completed successfully
  
  
Posted 11 months ago

Actually, clearing the cache on the other project might have fixed it. I just tested it out and it seems to be working.

  
  
Posted 11 months ago