Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M Having A Hard Time With Git Cloning + Cache For A Private Repo Accessed Via Personal Access Token. This Happens 100% Of The Time, Across Both Bitbucket + Github. I Have A Simple "Hello World" Task In A Private Repo. The Worker Is Running In A Docker

I'm having a hard time with git cloning + cache for a private repo accessed via personal access token. This happens 100% of the time, across both bitbucket + github.

I have a simple "hello world" task in a private repo.
The worker is running in a docker container called worker built from this Dockerfile :

FROM python:3.10.10
RUN useradd -u 1000 -ms /bin/bash user
RUN apt-get update \
    && apt-get install -yqq \
	   graphviz \
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/

RUN pip install clearml-agent  # optional
WORKDIR /home/user

ADD entrypoint.sh /home/user/entrypoint.sh
RUN chmod +x /home/user/entrypoint.sh
RUN chown user:user /home/user/entrypoint.sh
USER user
ENV PATH=/home/user/.local/bin:$PATH
CMD "./entrypoint.sh"

where entrypoint.sh is a modified version of the default one from agent-services:

#!/bin/sh +x

if [ -n "$SHUTDOWN_IF_NO_ACCESS_KEY" ] && [ -z "$CLEARML_API_ACCESS_KEY" ] && [ -z "$TRAINS_API_ACCESS_KEY" ]; then
  echo "CLEARML_API_ACCESS_KEY was not provided, service will not be started"
  exit 0
fi

export CLEARML_WORKER_ID=${CLEARML_WORKER_ID:-$HOSTNAME}
export CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-$TRAINS_FILES_HOST}

if [ -z "$CLEARML_FILES_HOST" ]; then
    CLEARML_HOST_IP=${CLEARML_HOST_IP:-${TRAINS_HOST_IP:-$(curl -s 

fi

export CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-${TRAINS_FILES_HOST:-"http://$CLEARML_HOST_IP:8081"}}
export CLEARML_WEB_HOST=${CLEARML_WEB_HOST:-${TRAINS_WEB_HOST:-"http://$CLEARML_HOST_IP:8080"}}
export CLEARML_API_HOST=${CLEARML_API_HOST:-${TRAINS_API_HOST:-"http://$CLEARML_HOST_IP:8008"}}

echo $CLEARML_FILES_HOST $CLEARML_WEB_HOST $CLEARML_API_HOST 1>&2

# DAEMON_OPTIONS=${CLEARML_AGENT_DAEMON_OPTIONS:---services-mode --create-queue}
DAEMON_OPTIONS=""
QUEUES=${CLEARML_AGENT_QUEUES:-services}

if [ -z "$CLEARML_AGENT_NO_UPDATE" ]; then
  if [ -n "$CLEARML_AGENT_UPDATE_REPO" ]; then
    python3 -m pip install -q -U $CLEARML_AGENT_UPDATE_REPO
  else
    python3 -m pip install -q -U "clearml-agent${CLEARML_AGENT_UPDATE_VERSION:-$TRAINS_AGENT_UPDATE_VERSION}"
  fi
fi

clearml-agent daemon $DAEMON_OPTIONS --queue $QUEUES --cpu-only ${CLEARML_AGENT_EXTRA_ARGS:-$TRAINS_AGENT_EXTRA_ARGS}

docker-compose.yml
notice: no volume mounts. new container = completely fresh state

version: "3.6"

x-worker_template: &worker_defaults
  image: worker
  cpu_count: 2
  deploy:
    restart_policy:
      condition: on-failure
  privileged: true
  env_file: .env

services:
  worker_01:
    <<: *worker_defaults
    container_name: worker01
    environment:
      CLEARML_WORKER_ID: "01hn23k9rr7zysp3scjbwhrppg-worker-01"

in default worker mode this is what happens:
(first execution): clones repo just fine, happily completes task
(second execution): always throws the following error because it's now trying to using vcs-cache

repository = git@github.com:michael-build/nucleus-clearml.git
branch = main
version_num = 
tag = 
docker_cmd = python:3.10.10 --env-file=/root/.clearml/.env
entry_point = task_hello_world.py
working_dir = tasks
::: Using Cached environment /home/user/.clearml/venvs-cache/a61d870d71a2b3c4ca7f2a5a617a1242 :::
Using cached repository in "/home/user/.clearml/vcs-cache/nucleus-clearml.git.7a0bc5a5f52a1660a796b73c0d9ca015/nucleus-clearml.git"
fatal: could not read Username for '
': terminal prompts disabled
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@github.com:michael-build/nucleus-clearml.git', branch='main', commit_id='', tag='', docker_cmd='python:3.10.10 --env-file=/root/.clearml/.env', entry_point='task_hello_world.py', working_dir='tasks')
2) Check if remote-worker has valid credentials [see worker configuration file]

the credentials are definitely valid, and the Task (in web UI) points to "Latest commit in main branch". Again, this happens consistently with both bitbucket and github, so it appears related to git entirely.

  
  
Posted 9 days ago
Votes Newest

Answers 5


so far it seems that turning off cache like this is my "best option"
image

  
  
Posted 9 days ago

I can see agent.vcs_cache.enabled = true as a printout in the Console, but cannot find docs on how to set this via environment variable, since I'm trying to keep these containers from needing a clearml.conf file (though I can generate on in the entrypoint script if need be with <EOF> )

  
  
Posted 9 days ago

Hi @<1689446563463565312:profile|SmallTurkey79> , indeed, you can turn it off by passing this configuration in the config file ( agent.vcs_cache.enabled: false will also work). By using dynamic env vars, you can also use this env var to set the same value: CLEARML_AGENT__AGENT__VCS_CACHE__ENABLED=false (see here for more details)

  
  
Posted 8 days ago

By the way, which agent version are you using? Can you include the complete task log?

  
  
Posted 8 days ago

i ended up pinning the Dockerfile instruction to 1.18 but before that was letting the entrypoint script do the install (so, latest) .

much appreciate the env var tip . that's more elegant than what i did .

since I've turned off caching I've had much better luck . is what I'm experiencing a bug? (bitbucket nor github private repository work on second task per worker)

  
  
Posted 8 days ago
69 Views
5 Answers
9 days ago
7 days ago
Tags