Hi UnevenDolphin73 ,
which agent version are you using? Do you setup the env variable in the agentโs machine too?
- Can you set env var
CLEARML_DOCKER_SKIP_GPUS_FLAG
to true?
Regarding this - https://clearml.slack.com/archives/CTK20V944/p1657525402861009?thread_ts=1657291641.224139&cid=CTK20V944 - can you add some more info? maybe the log?
UnevenDolphin73fatal: could not read Username for '
': terminal prompts disabled .. fatal: clone of '
' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failed
It seems it tries to clone a submodule and fails due to to missing keys for the submodule.
https://stackoverflow.com/questions/7714326/git-submodule-url-not-including-username
wdyt?
Odd; switching to virtual environment results infatal: could not read Username for '
': terminal prompts disabled
even though it does earlier show that:agent.git_user = xxx
I just set the git credentials in the
clearml.conf
and it works out of the box
git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.
Nope, no
.netrc
defined anywhere, ...
If this is the case can you try to add the following to your "extra_vm_bash_script"echo machine example.com > ~/.netrc && echo login MY_USERNAME >> ~/.netrc && echo password MY_PASSWORD >> ~/.netrc
I'm trying, let's see; our infra person is away on holidays :X Thanks! Uh, which configuration exactly would you like to see? We're running using the helm charts on K8s, so I don't think I have direct access to the agent configuration/update it separately?
I'm using some old agent I fear, since our infra person decided to use chart 3.3.0 ๐
I'll try with the env var too. Do you personally recommend docker over the simple AMI + virtual environment?
More complete log does not add much information -Cloning into '/root/.clearml/venvs-builds/3.10/task_repository/xxx/xxx'... fatal: could not read Username for '
': terminal prompts disabled fatal: clone of '
' into submodule path '/root/.clearml/venvs-builds/3.10/task_repository/xxx/xxx' failed Failed to clone 'xxx'. Retry scheduled Cloning into '/root/.clearml/venvs-builds/3.10/task_repository/xxx/xxx'... fatal: could not read Username for '
': terminal prompts disabled fatal: clone of '
' into submodule path '/root/.clearml/venvs-builds/3.10/task_repository/xxx/xxx' failed Failed to clone 'xxx' a second time, aborting
Then the username and password would be visible in the autoscaler task ๐
But it should work out of the box, as it does work like that out of the box also regardless of ClearML. The user and personal access token are used as is and it propagates down to submodules, since those are simply another git repository.
I've further checks on a different machine and it works as well ๐ค
Can you verify by adding the the following to your extra_docker_shell_script:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L152extra_docker_shell_script: ["echo machine example.com > ~/.netrc", "echo login MY_USERNAME >> ~/.netrc", "echo password MY_PASSWORD >> ~/.netrc"]
Follow-up; any ideas how to avoid PEP 517 with the auto scaler?
Takes a
long
time to build the wheels
enable venv caching ?
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L116
git config --system credential.helper 'store --file /root/.git-credentials'
Maybe we should use this hack for cloning with user/token in general ...
hmm this might help:
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables
basically you might be able to define:PIP_NO_USE_PEP517=1
works seamlessly throughout and in our current on premise servers...
I'm assuming via something close to what I suggested above with .netrc ?
I just set the git credentials in the clearml.conf
and it works out of the box
But it should work out of the box ...
Yes it should ....
The user and personal access token are used as is and it propagates down to submodules, since those are simply another git repository.
Can you manually successfully run:git clone --recursive
https://user:token@github.com/company/repo_with_submodules
I think this is about maybe the credential.helper
used
Different AMI image/installing older Python instances that don't enforce this...
For future reference though, the environment variable should be PIP_USE_PEP517=false
That's enabled; I was aiming if there are flags to add to pip install
CLI, such as --no-use-pep517
Iโm using some old agent I fear, since our infra person decided to use chart 3.3.0
That could be the issue, can you update to the latest version so we can check if this is the issue?
Iโll try with the env var too. Do you personally recommend docker over the simple AMI + virtual environment?
Depends, with docker you know what youll get and you can control many, venv should be quicker and you can set it before
More complete log does not add much information -
Can you send the configuration? can you try it with the latest agent?
AgitatedDove14 The keys are there, and there is no specifically defined user in .gitmodules
:[submodule "xxx"] path = xxx url =
I believe this has to do with how ClearML sets up the git credentials perhaps?
We have a read-only user with personal access token for these things, works seamlessly throughout and in our current on premise servers... So perhaps something missing in the autoscaler definitions?
Nope, no .netrc
defined anywhere, really (+I've abandoned the use of docker for the autoscaler as it complicates things, at least for now)
TimelyPenguin76 I added pip install --update clearml-agent
to the extra_vm_bash_script
for the autoscaler, that should at least guarantee the latest clearml agent is used on the instance, right?
That was a good idea, unfortunately did not help too much, but I think I may have a found a work around, thanks!
Sounds like a nice idea ๐
Follow-up; any ideas how to avoid PEP 517 with the auto scaler? ๐ค Takes a long time to build the wheels
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials'
to the extra_vm_bash_script
and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
TimelyPenguin76 here's the full log (took a moment to anonynomize completely):
`
Using environment access key CLEARML_API_ACCESS_KEY=xxx
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.3.0, location: /tmp/.clearml_agent.zs4e7egs.cfg):
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.files_server =
api.web_server =
api.credentials.access_key = xxx
api.host =
agent.worker_id = dynamic_worker:aws_simple:t3a.xlarge:xxx
agent.worker_name = xxx
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /root/.clearml/pip-cache
agent.docker_apt_cache = /root/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.git_user = xxx
agent.default_python = 3.10
agent.cuda_version = 0
agent.cudnn_version = 0
Executing task id [3a666f3f2b9e46bcb4cd0d59b6a0e3af]:
repository =
branch = main
version_num = 96fd6c76446926333d445cfb4b176e0d29ab8aeb
tag =
docker_cmd =
entry_point = train.py
working_dir = .
Python executable with version '3.7' requested by the Task, not found in path, using '/clearml_agent_venv/bin/python3' (v3.10.4) instead
created virtual environment CPython3.10.4.final.0-64 in 690ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==22.1.2, setuptools==62.6.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
cloning:
fatal: could not read Username for ' ': terminal prompts disabled
fatal: clone of ' ' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failed
Failed to clone 'xxx'. Retry scheduled
fatal: could not read Username for ' ': terminal prompts disabled
fatal: clone of ' ' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failed
Failed to clone 'xxx' a second time, aborting
Repository cloning failed: Command '['clone', ' ', '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx', '--quiet', '--recursive']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='', branch='main', commit_id='96fd6c76446926333d445cfb4b176e0d29ab8aeb', tag='', docker_cmd=None, entry_point='train.py', working_dir='.')
- Check if remote-worker has valid credentials [see worker configuration file] `