Answered

Hi All, I Am Starting To Use Clearml-Agent. Run It With

Hi all,
I am starting to use clearml-agent.
run it with
clearml-agent daemon --foreground --gpus 3 --queue default --docker MyDockerImage:v0then I enqueued new job in the UI
but I am getting a cp: failed to access '/root/default_clearml.conf': Permission denied
before the train job is even started.

What could it be?
Does the user inside the docker must be the root user?
Or I am using it wrong?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

Votes Newest

Answers 30

I am creating this user

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

I am creating this user

Please explain, I think this is the culprit ...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ARG USER_ID=1000 RUN useradd -m --no-log-init --system --uid ${USER_ID} appuser -g sudo RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers USER appuser WORKDIR /home/appuser

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

Yes this is definitely the issue, the agent assume the docker user is "root".
Let me check something

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks I am basing my docker on https://github.com/facebookresearch/detectron2/blob/master/docker/Dockerfile

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

because my project use it

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

The issue itself is changing the default user.

USER appuser
WORKDIR /home/appuser

Any reason for it ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I did it just because FAIR did it in detectron2 Dockerfile

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

Let me check a sec

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Are you inheriting from their docker file ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I took their and add my project

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

I can try to not creating new user

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

but I am think they done it for a reason no?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

but I am think they done it for a reason no?

Not a very good one, they just installed everything under the user and used --user for the pip.
It really does not matter inside a docker, the only reason one might want to do that is if you are mounting other drives and you want to make sure they are not accessed with "root" user, but with 1000 user id.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Let me check if we can hack something...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Not a very good one, they just installed everything under the user and used --user for the pip.
It really does not matter inside a docker, the only reason one might want to do that is if you are mounting other drives and you want to make sure they are not accessed with "root" user, but with 1000 user id.

This sounds a good reason haha 😄

Let me check if we can hack something...

Thanks 🙏

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

CooperativeFox72
Could you try to run the docker and then inside the docker try to do:
su root whoami

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 ,
Sorry for the late response It was late at my country 🙂 .

This what I am getting
appuser@219886f802f0:~$ sudo su root root@219886f802f0:/home/appuser# whoami root

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

Okay we have something 🙂
To your clearml.conf add:
agent.docker_preprocess_bash_script = [ "su root", "cp -f /root/*.conf ~/", ]Let's see if that works

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks, will try and let you know 😊

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

It is now stacking after:
2021-03-09 14:54:07 task 609a976a889748d6a6e4baf360ef93b4 pulled from 8e47f5b0694e426e814f0855186f560e by worker ov-01:gpu1 2021-03-09 14:54:08 running Task 609a976a889748d6a6e4baf360ef93b4 inside default docker image: MyDockerImage:v0 2021-03-09 14:54:08 Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'CLEARML_WORKER_ID=ov-01:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=MyDockerImage:v0', '-v', '/tmp/.clearml_agent.jvxowhq4.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.n9gr_ou9:/root/.ssh', '-v', '/home/ophir/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/ophir/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/ophir/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/ophir/.trains/cache:/clearml_agent_cache', '-v', '/home/ophir/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'MyDockerImage:v0', 'bash', '-c', 'sudo su root ; cp -f /root/*.conf ~/ ; echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 609a976a889748d6a6e4baf360ef93b4'] 2021-03-09 14:54:13 ]0;root@edd13d234b4d: /home/appuserroot@edd13d234b4d:/home/appuser#

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

So I ask my boss and DevOps and they say for now we can use the root user inside the docker image...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

So for now I am leaving this issue...
Thanks a lot 🙏 🙌

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

I have an other question.
Now that I using the root user it looks better,
But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

I just need it to ran the docker and run the command inside it no?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

Hi CooperativeFox72

But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again? (edited)

So based on the docker file you previously posted, I think all your python packages are actually installed on the "appuser" and not as system packages.
Basically remove the "add user" part and the --user from the pip install.
For example:
` FROM nvidia/cuda:10.1-cudnn7-devel

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python

WORKDIR /root/

RUN wget &&
python3 get-pip.py &&
rm get-pip.py

install dependencies

See for other options if you use a different version of CUDA

RUN pip install tensorboard cmake # cmake from apt-get is too old
RUN pip install torch==1.8 torchvision==0.9 -f

RUN pip install 'git+ '

install detectron2

RUN git clone detectron2_repo

set FORCE_CUDA because during `docker build` cuda is not accessible

ENV FORCE_CUDA="1"

This will by default build detectron2 for all common cuda architectures and take a lot more time,

because inside `docker build`, there is no way to tell which architecture will be used.

ARG TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
ENV TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"

RUN pip install -e detectron2_repo

Set a fixed model cache directory.

ENV FVCORE_CACHE="/tmp" `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks, I will make sure that all the python packages install as root..
And will let you know if it works

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

👍

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ok looks It is starting the training...
Thanks 💯

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CooperativeFox72
				
					0
					 × 1

YEY!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

30 Answers

3 years ago

one year ago