Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
I'M Having A Hard Time With Git Cloning + Cache For A Private Repo Accessed Via Personal Access Token. This Happens 100% Of The Time, Across Both Bitbucket + Github. I Have A Simple "Hello World" Task In A Private Repo. The Worker Is Running In A Docker

I'm having a hard time with git cloning + cache for a private repo accessed via personal access token. This happens 100% of the time, across both bitbucket + github.

I have a simple "hello world" task in a private repo.
The worker is running in a docker container called worker built from this Dockerfile :

FROM python:3.10.10
RUN useradd -u 1000 -ms /bin/bash user
RUN apt-get update \
    && apt-get install -yqq \
	   graphviz \
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/

RUN pip install clearml-agent  # optional
WORKDIR /home/user

ADD entrypoint.sh /home/user/entrypoint.sh
RUN chmod +x /home/user/entrypoint.sh
RUN chown user:user /home/user/entrypoint.sh
USER user
ENV PATH=/home/user/.local/bin:$PATH
CMD "./entrypoint.sh"

where entrypoint.sh is a modified version of the default one from agent-services:

#!/bin/sh +x

  echo "CLEARML_API_ACCESS_KEY was not provided, service will not be started"
  exit 0


if [ -z "$CLEARML_FILES_HOST" ]; then




# DAEMON_OPTIONS=${CLEARML_AGENT_DAEMON_OPTIONS:---services-mode --create-queue}

if [ -z "$CLEARML_AGENT_NO_UPDATE" ]; then
  if [ -n "$CLEARML_AGENT_UPDATE_REPO" ]; then
    python3 -m pip install -q -U $CLEARML_AGENT_UPDATE_REPO
    python3 -m pip install -q -U "clearml-agent${CLEARML_AGENT_UPDATE_VERSION:-$TRAINS_AGENT_UPDATE_VERSION}"


notice: no volume mounts. new container = completely fresh state

version: "3.6"

x-worker_template: &worker_defaults
  image: worker
  cpu_count: 2
      condition: on-failure
  privileged: true
  env_file: .env

    <<: *worker_defaults
    container_name: worker01
      CLEARML_WORKER_ID: "01hn23k9rr7zysp3scjbwhrppg-worker-01"

in default worker mode this is what happens:
(first execution): clones repo just fine, happily completes task
(second execution): always throws the following error because it's now trying to using vcs-cache

repository = git@github.com:michael-build/nucleus-clearml.git
branch = main
version_num = 
tag = 
docker_cmd = python:3.10.10 --env-file=/root/.clearml/.env
entry_point = task_hello_world.py
working_dir = tasks
::: Using Cached environment /home/user/.clearml/venvs-cache/a61d870d71a2b3c4ca7f2a5a617a1242 :::
Using cached repository in "/home/user/.clearml/vcs-cache/nucleus-clearml.git.7a0bc5a5f52a1660a796b73c0d9ca015/nucleus-clearml.git"
fatal: could not read Username for '
': terminal prompts disabled
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@github.com:michael-build/nucleus-clearml.git', branch='main', commit_id='', tag='', docker_cmd='python:3.10.10 --env-file=/root/.clearml/.env', entry_point='task_hello_world.py', working_dir='tasks')
2) Check if remote-worker has valid credentials [see worker configuration file]

the credentials are definitely valid, and the Task (in web UI) points to "Latest commit in main branch". Again, this happens consistently with both bitbucket and github, so it appears related to git entirely.

Posted 11 months ago
Votes Newest

Answers 14

The clone is the default used by git (you can actually see the command in the log)

Posted 10 months ago

yes i actually have been able to turn on caching after rc2 of the agent! been working much better .

Posted 10 months ago

I can see agent.vcs_cache.enabled = true as a printout in the Console, but cannot find docs on how to set this via environment variable, since I'm trying to keep these containers from needing a clearml.conf file (though I can generate on in the entrypoint script if need be with <EOF> )

Posted 11 months ago

and for what its worth it seems I dont have anything special for agent cloning

i did find agent.vcs_cache.clone_on_pull_fail to be helpful . but yah, updating the agent was the biggest fix

Posted 9 months ago

i ended up pinning the Dockerfile instruction to 1.18 but before that was letting the entrypoint script do the install (so, latest) .

much appreciate the env var tip . that's more elegant than what i did .

since I've turned off caching I've had much better luck . is what I'm experiencing a bug? (bitbucket nor github private repository work on second task per worker)

Posted 11 months ago

so far it seems that turning off cache like this is my "best option"

Posted 11 months ago

SmallTurkey79 did you solved this issue with fatal: could not read Username ?

Posted 9 months ago

BTW a new agent version has been released, I'd recommend trying it out

Posted 10 months ago

Hi SmallTurkey79 , indeed, you can turn it off by passing this configuration in the config file ( agent.vcs_cache.enabled: false will also work). By using dynamic env vars, you can also use this env var to set the same value: CLEARML_AGENT__AGENT__VCS_CACHE__ENABLED=false (see here for more details)

Posted 11 months ago

Okay thank you so much
But I think I solve problem with credentials by using clearml_agent v1.8.1rc2
But now I get an issue with local python modules 🫠
Even when I set

agent.skip_pip_venv_install = 1
agent.skip_python_env_install = /usr/bin/python

In worker logs I see:

Environment setup completed successfully
Starting Task Execution:
Posted 9 months ago

yeah i ended up figuring it out . i think we are in similar situations (private git repo w token) . ill take a look at my config tomorrow but from memory, you have to set your env variables and have an option in your config to force https protocol if you're using a token .

Posted 9 months ago

update: ever since turning off git caching, i've had much more stability. i cannot tell whether it's causing a slow down in task execution though - is the clone a shallow one by default?

Posted 10 months ago

so, i got around this with env vars

in my worker entrypoint script , I do


Posted 9 months ago

By the way, which agent version are you using? Can you include the complete task log?

Posted 11 months ago
14 Answers
11 months ago
9 months ago