Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I’M Getting This Error When I Try To Run Task On A Remote Agent With Docker Mode Web Ui:

Hi, I’m getting this error when I try to run task on a remote agent with docker mode

WEB UI:
` Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', 'git@github.com:<MY_USER>/<MY_PUBLIC_REPO>.git', '/root/.clearml/vcs-cache/<MY_PUBLIC_REPO>.git.24ae26283cc719b2c0e9acc50a8a0e1c/<MY_PUBLIC_REPO>.git', '--quiet', '--recursive']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository.

  1. Make sure you pushed the requested commit:
    (repository='git@github.com:<MY_USER>/<MY_PUBLIC_REPO>.git', branch='main', commit_id='99314e1d3b5cce6a3cef0f48af2578631ee6a21a', tag='', docker_cmd='nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04', entry_point='test.py', working_dir='.')
  2. Check if remote-worker has valid credentials [see worker configuration file] Logs shows me that key is mounted to the docker container: Executing: ['docker', 'run', '-t', '--gpus', 'all', .......... '-v', '/tmp/clearml_agent.ssh.kqzj9sky:/root/.ssh' ........] `
    LOCAL MACHINE:
  3. The task was started by clearml-task command

REMOTE MACHINE:

  1. git ssh key is located at ~/.ssh/id_rsa

  2. ~/clearml.conf:
    git_user="" git_pass="" git_host="" force_git_ssh_protocol: true

  3. ssh-keyscan -H http://github.com >> ~/.ssh/known_hosts

  4. command to start an agent:
    clearml-agent daemon --queue default --docker --force-current-version --detached

can anyone help?

  
  
Posted one year ago
Votes Newest

Answers 16


AgitatedDove14 Sorry for mention, but wanted to ask the same question. Did you get to the bottom of the issue above? When the .ssh folder could be copied only for the first task after the agent daemon has been started but not for the following ones (it complains about not being able to create a temp copy until I restart the agent)

  
  
Posted one year ago

EnviousPanda91
in your clearml.conf I think you are missing a section
agent.git_user="" agent.git_pass="" agent.git_host="" agent.force_git_ssh_protocol: true

  
  
Posted one year ago

Hi CostlyOstrich36

How are you mounting the credentials?
Is this also mounted into the docker itself?

as I wrote above, it is mounted automatically:
'-v', '/tmp/clearml_agent.ssh.kqzj9sky:/root/.ssh

What version of

ClearML-Agent

are you using?

1.3.0

  
  
Posted one year ago

HI BurlyRaccoon64
Yes, we did the latest clearml-agent solves the issue, please try:
'pip3 install -U --pre clearml-agent'

  
  
Posted one year ago

Logs shows me that key is mounted to the docker container

How are you mounting the credentials?
What version of ClearML-Agent are you using?

  
  
Posted one year ago

REMOTE MACHINE:

  1. git ssh key is located at ~/.ssh/id_rsa

Is this also mounted into the docker itself?

  
  
Posted one year ago

when I restart the agent, it works fine, but on the second launch docker does not mount the ssh keys folder:
'-v', '/tmp/clearml_agent.ssh.rbw8o0t7:/root/.ssh',I don’t understand why. AgitatedDove14 JitteryCoyote63 could you explain the logic behind that? CLEARML_AGENT_DISABLE_SSH_MOUNT variable is not set.

So it fails with this log message:
` ...
Using cached repository in "/root/.clearml/vcs-cache/<MY_REPO>.git.893c8c47c9813c27eb1fe8d0aeb77a11/<MY_REPO>.git"
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.

  1. Make sure you pushed the requested commit:
    (repository='git@github.com:<GIT_USER>/<MY_REPO>.git', branch='master', commit_id='46c86354e58e50a811e870c7b163ea5734499a67', tag='', docker_cmd='nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04', entry_point='test.py', working_dir='bench')
  2. Check if remote-worker has valid credentials [see worker configuration file] `
  
  
Posted one year ago

EnviousPanda91 the host checks if you have a .ssh folder on the machine, if you do, it will copy+mount it into the container, then it will delete the copy when the container is down.
Specifically /tmp/clearml_agent.ssh.rbw8o0t7 is the copy of the .ssh that the agent created, and now it is mounting it into the container

  
  
Posted one year ago

AgitatedDove14

Specifically

/tmp/clearml_agent.ssh.rbw8o0t7

is the copy of the .ssh that the agent created, and now it is mounting it into the container

but why is it mounted only once? second and following containers do not mount the folder

  
  
Posted one year ago

EnviousPanda91 Hi! Did you manage to solve the issue? I've encountered the same behavior when the agent can't create a temp copy of .ssh folder for the second and all the following tasks: Failed creating temporary copy of ~/.ssh for git credential

  
  
Posted one year ago

AgitatedDove14 sorry, no, in fact my configuration looks like:

` ...

agent.git_user=""
agent.git_pass=""
agent.git_host=""

agent.package_manager.extra_index_url= [

]

agent {
worker_id: ""
worker_name: ""
force_git_ssh_protocol: true

... `

  
  
Posted one year ago

AgitatedDove14

Are you saying the second time this line is missing?

Yes.

Can you send the full Task log?

I will send the log in direct messages.

  
  
Posted one year ago

but why is it mounted only once?

Are you saying the second time this line is missing? this is very strange...
Can you send the full Task log?

  
  
Posted one year ago

Thanks! That worked. If you don't mind could you point me in the direction where I can find the commit that resolved it?

  
  
Posted one year ago