Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I’M Getting These Errors When Using Agent In Docker Mode

I’m getting these errors when using agent in docker mode
-bash: /etc/apt/apt.conf.d/docker-clean: Permission denied chown: cannot access '/root/.cache/pip': Permission denied

  
  
Posted 3 years ago
Votes Newest

Answers 17


Hi LazyTurkey38 , how exactly are you running the agent? Also, are.you using a custom base docker image, and if so, which one?

  
  
Posted 3 years ago

SuccessfulKoala55 I tried to make a docker image by combining one of our dockerfiles with this https://github.com/allegroai/clearml-agent/blob/master/docker/agent/Dockerfile . I modified the entrypoint to also be a combination of both.

Right now I’m not seeing that error, but the the process seems to exit (as completed) after the docker run . I’m wondering if my Dockerfile is not properly setup and it’s exiting before the deamon is started.

  
  
Posted 3 years ago

it works if I run the same command manually.

What do you mean?
Can you do:
docker run -it <my container here> bashThen immediately get an interactive bash ?

  
  
Posted 3 years ago

might it be related to the docker socket not being mounted to the agent daemon running inside a docker container?

Oh yes, if the daemon is running Inside a docker container than you need both --privileged and mounting of the docker socket, to get it to work

  
  
Posted 3 years ago

So you're running the agent daemon itself inside docker?

  
  
Posted 3 years ago

ugh, sudo actually makes it fail explicitly because
` error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.

  1. Make sure you pushed the requested commit:
    (repository='git@github.com:salimmj/clearml-demo.git', branch='main', commit_id='f76f3affd28d5558928d7ffd9a6797890ffdd708', tag='', docker_cmd='nvidia/cuda:11.4.0-runtime-ubuntu20.04', entry_point='mnist.py', working_dir='.')
  2. Check if remote-worker has valid credentials [see worker configuration file] `
  
  
Posted 3 years ago

clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0

If the user running this command can run "docker run", then you should ne fine

  
  
Posted 3 years ago

AgitatedDove14 might it be related to the docker socket not being mounted to the agent daemon running inside a docker container?

  
  
Posted 3 years ago

LazyTurkey38 notice the assumption is that the docker entry-point ends with bash, and only then the agent take charge. I'm assuming this is not te case hence the agent spins the docker, then the docker just ends, could that be?

  
  
Posted 3 years ago

AgitatedDove14 should I try running the above command with privileged user?

  
  
Posted 3 years ago

AgitatedDove14 no I mean I can do:
docker run -t --gpus "device=1" -dit -e APP_ENV=kprod -e CLEARML_WORKER_ID=ada:gpu1 -e CLEARML_DOCKER_IMAGE=922531023312.dkr.ecr.us-west-2.amazonaws.com/jym-coach:202108080511.7e8d6d1 -v /home/smjahad/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.kjx6r9oo.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.l8cguj81:/root/.ssh -v /home/smjahad/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/smjahad/.clearml/pip-cache:/root/.cache/pip -v /home/smjahad/.clearml/pip-download-cache:/root/.clearml/pip-download-cache -v /home/smjahad/.clearml/cache:/clearml_agent_cache -v /home/smjahad/.clearml/vcs-cache:/root/.clearml/vcs-cache --rm <IMAGELINK> bash -c echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip==21.2.3" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 1be91a7331ca46b7ae4bc4024d93bb36Which is the same command that shows up on the experiment log (in UI) before it ends as “complete”

  
  
Posted 3 years ago

btw, 

 I launch the agent 

daemon

 outside docker (with 

--docker

) , that’s the way it is supposed to work right?

Yep that should work
is it ?

  
  
Posted 3 years ago

It’s not that I think because it works if I run the same command manually.

  
  
Posted 3 years ago

btw, AgitatedDove14 I launch the agent daemon outside docker (with --docker ) , that’s the way it is supposed to work right?

$ clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
And then the worker itself will run the docker run command for me and start another non-daemon agent inside.
I guess the failure happens when it tries to switch to docker because the same experiment works with agents not started with --docker flag

  
  
Posted 3 years ago

It seems to try to p[ull with SSH credentials, add your user/pass(or better APIkey) to the clearml.conf
(look for git_user /git_pass)
Should solve the issue

  
  
Posted 3 years ago

YEY!

  
  
Posted 3 years ago

I tried with and without. I’m having the issue where if I run the task from the queue it will complete as soon as it goes into docker but if I run the same docker run it works.

  
  
Posted 3 years ago
1K Views
17 Answers
3 years ago
2 years ago
Tags