Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Am Starting To Use Clearml-Agent. Run It With

Hi all,
I am starting to use clearml-agent.
run it with
clearml-agent daemon --foreground --gpus 3 --queue default --docker MyDockerImage:v0then I enqueued new job in the UI
but I am getting a cp: failed to access '/root/default_clearml.conf': Permission denied
before the train job is even started.

What could it be?
Does the user inside the docker must be the root user?
Or I am using it wrong?

  
  
Posted 3 years ago
Votes Newest

Answers 30


I am creating this user

  
  
Posted 3 years ago

I am creating this user

Please explain, I think this is the culprit ...

  
  
Posted 3 years ago

ARG USER_ID=1000 RUN useradd -m --no-log-init --system --uid ${USER_ID} appuser -g sudo RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers USER appuser WORKDIR /home/appuser

  
  
Posted 3 years ago

Yes this is definitely the issue, the agent assume the docker user is "root".
Let me check something

  
  
Posted 3 years ago

because my project use it

  
  
Posted 3 years ago

The issue itself is changing the default user.

USER appuser
WORKDIR /home/appuser

Any reason for it ?

  
  
Posted 3 years ago

I did it just because FAIR did it in detectron2 Dockerfile

  
  
Posted 3 years ago

Let me check a sec

  
  
Posted 3 years ago

Are you inheriting from their docker file ?

  
  
Posted 3 years ago

I took their and add my project

  
  
Posted 3 years ago

I can try to not creating new user

  
  
Posted 3 years ago

but I am think they done it for a reason no?

  
  
Posted 3 years ago

but I am think they done it for a reason no?

Not a very good one, they just installed everything under the user and used --user for the pip.
It really does not matter inside a docker, the only reason one might want to do that is if you are mounting other drives and you want to make sure they are not accessed with "root" user, but with 1000 user id.

  
  
Posted 3 years ago

Let me check if we can hack something...

  
  
Posted 3 years ago

Not a very good one, they just installed everything under the user and used --user for the pip.
It really does not matter inside a docker, the only reason one might want to do that is if you are mounting other drives and you want to make sure they are not accessed with "root" user, but with 1000 user id.

This sounds a good reason haha 😄

Let me check if we can hack something...

Thanks 🙏

  
  
Posted 3 years ago

CooperativeFox72
Could you try to run the docker and then inside the docker try to do:
su root whoami

  
  
Posted 3 years ago

Hi AgitatedDove14 ,
Sorry for the late response It was late at my country 🙂 .

This what I am getting
appuser@219886f802f0:~$ sudo su root root@219886f802f0:/home/appuser# whoami root

  
  
Posted 3 years ago

Okay we have something 🙂
To your clearml.conf add:
agent.docker_preprocess_bash_script = [ "su root", "cp -f /root/*.conf ~/", ]Let's see if that works

  
  
Posted 3 years ago

Thanks, will try and let you know 😊

  
  
Posted 3 years ago

It is now stacking after:
2021-03-09 14:54:07 task 609a976a889748d6a6e4baf360ef93b4 pulled from 8e47f5b0694e426e814f0855186f560e by worker ov-01:gpu1 2021-03-09 14:54:08 running Task 609a976a889748d6a6e4baf360ef93b4 inside default docker image: MyDockerImage:v0 2021-03-09 14:54:08 Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'CLEARML_WORKER_ID=ov-01:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=MyDockerImage:v0', '-v', '/tmp/.clearml_agent.jvxowhq4.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.n9gr_ou9:/root/.ssh', '-v', '/home/ophir/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/ophir/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/ophir/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/ophir/.trains/cache:/clearml_agent_cache', '-v', '/home/ophir/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'MyDockerImage:v0', 'bash', '-c', 'sudo su root ; cp -f /root/*.conf ~/ ; echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 609a976a889748d6a6e4baf360ef93b4'] 2021-03-09 14:54:13 ]0;root@edd13d234b4d: /home/appuserroot@edd13d234b4d:/home/appuser#

  
  
Posted 3 years ago

So I ask my boss and DevOps and they say for now we can use the root user inside the docker image...

  
  
Posted 3 years ago

So for now I am leaving this issue...
Thanks a lot 🙏 🙌

  
  
Posted 3 years ago

I have an other question.
Now that I using the root user it looks better,
But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again?

  
  
Posted 3 years ago

I just need it to ran the docker and run the command inside it no?

  
  
Posted 3 years ago

Hi CooperativeFox72

But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again? (edited)

So based on the docker file you previously posted, I think all your python packages are actually installed on the "appuser" and not as system packages.
Basically remove the "add user" part and the --user from the pip install.
For example:
` FROM nvidia/cuda:10.1-cudnn7-devel

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
    python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python

WORKDIR /root/

RUN wget &&
    python3 get-pip.py &&
    rm get-pip.py

install dependencies

See for other options if you use a different version of CUDA

RUN pip install tensorboard cmake   # cmake from apt-get is too old
RUN pip install torch==1.8 torchvision==0.9 -f

RUN pip install 'git+ '

install detectron2

RUN git clone detectron2_repo

set FORCE_CUDA because during docker build cuda is not accessible

ENV FORCE_CUDA="1"

This will by default build detectron2 for all common cuda architectures and take a lot more time,

because inside docker build, there is no way to tell which architecture will be used.

ARG TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
ENV TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"

RUN pip install -e detectron2_repo

Set a fixed model cache directory.

ENV FVCORE_CACHE="/tmp" `

  
  
Posted 3 years ago

Thanks, I will make sure that all the python packages install as root..
And will let you know if it works

  
  
Posted 3 years ago

👍

  
  
Posted 3 years ago

Ok looks It is starting the training...
Thanks 💯

  
  
Posted 3 years ago

YEY!

  
  
Posted 3 years ago
919 Views
30 Answers
3 years ago
one year ago
Tags