Hi SubstantialElk6
I can't see that is was removed, could you send the full log ?
I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
The docker image for the service-agent is not root ?
yup. in this case it wasn't root. Removing that USER and -u
in pip solves the problem. However, in our production images, we are required to remove root access.
` FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python
create a non-root user
ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} appuser -g sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER appuser
WORKDIR /home/appuser
ENV PATH="/home/appuser/.local/bin:${PATH}"
RUN wget &&
python3 get-pip.py --user &&
rm get-pip.py
install dependencies
See
for other options if you use a different version of CUDA
RUN pip install --user tensorboard cmake # cmake from apt-get is too old
RUN pip install --user torch==1.8 torchvision==0.9 -f
RUN pip install --user 'git+ '
install detectron2
RUN git clone detectron2_repo
set FORCE_CUDA because during docker build
cuda is not accessible
ENV FORCE_CUDA="1"
This will by default build detectron2 for all common cuda architectures and take a lot more time,
because inside docker build
, there is no way to tell which architecture will be used.
ARG TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
ENV TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"
RUN pip install --user -e detectron2_repo
Set a fixed model cache directory.
ENV FVCORE_CACHE="/tmp"
WORKDIR /home/appuser/detectron2_repo
run detectron2 under user "appuser":
wget
-O input.jpg
python3 demo/demo.py \
#--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
#--input input.jpg --output outputs/ \
#--opts MODEL.WEIGHTS `
Hmm, I think the issue is here (the docker command mount)'-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf'
I managed to find out why. The docker image I'm using is not set as root user thus the error. But I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
I suspect it failed to create one on the host and then mount into the docker
SubstantialElk6 This seems to be the issuecp: failed to access '/root/default_clearml.conf': Permission denied clearml_agent: ERROR: Could not find task id=024a421c0e174650a1c7ff64af756c26 (for host: )
Notice it seems it just cannot read the clearml.conf
, wdyt?
Its actually in your documentation. Its removed since 0.17 apparently.
https://allegro.ai/clearml/docs/docs/release_notes/ver_0_17.html#clearml-agent-0-17-2
And this is my logs, it tried to install something and encountered permission denied. It wouldn't if it obeyed the force_repo_requirements_txt.
1620664917916 Kahs-MacBook-Pro.local info ClearML Task: created new task id=024a421c0e174650a1c7ff64af756c26 ClearML results page:
`
1620664920359 Kahs-MacBook-Pro.local info ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
1620664922408 Kahs-MacBook-Pro.local info 2021-05-11 00:42:02,408 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
1620664924248 Kahs-MacBook-Pro.local info 2021-05-11 00:42:04,248 - clearml.Task - INFO - Finished repository detection and package analysis
1620664940154 master-node:gpu0 INFO task 024a421c0e174650a1c7ff64af756c26 pulled from a7d4a4f258834c5694d1b787a7a86f29 by worker master-node:gpu0
1620664943526 master-node:gpu0 INFO Running Task 024a421c0e174650a1c7ff64af756c26 inside docker: quay.io/jax79sg/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser
1620664944959 master-node:gpu0 INFO Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '--env', 'GIT_SSL_NO_VERIFY=true', '--env', 'TRAINS_AGENT_GIT_USER=testuser', '--env', 'TRAINS_AGENT_GIT_PASS=testuser', '-e', 'CLEARML_WORKER_ID=master-node:gpu0', '-e', 'CLEARML_DOCKER_IMAGE=quay.io/jax79sg/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser', '-v', '/home/jax/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.cw3fajtu:/root/.ssh', '-v', '/home/jax/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/jax/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/jax/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/jax/.clearml/cache:/clearml_agent_cache', '-v', '/home/jax/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'quay.io/jax79sg/detectron2:v3', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 024a421c0e174650a1c7ff64af756c26']
1620664951403 master-node:gpu0 DEBUG v3: Pulling from jax79sg/detectron2
Digest: sha256:e70459474711e1bdedcd42eae125af60f25ca77f048728c523923165fa822249
Status: Image is up to date for quay.io/jax79sg/detectron2:v3
quay.io/jax79sg/detectron2:v3
1620664957880 master-node:gpu0 DEBUG bash: /etc/apt/apt.conf.d/docker-clean: Permission denied
chown: cannot access '/root/.cache/pip': Permission denied
E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
E: Unable to lock directory /var/lib/apt/lists/
E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
/usr/bin/python3.6
pip 21.0.1 from /home/appuser/.local/lib/python3.6/site-packages/pip (python 3.6)
Defaulting to user installation because normal site-packages is not writeable
Collecting pip<20.2
Downloading pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
[K |################################| 1.5 MB 7.3 MB/s
[?25hInstalling collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.0.1
Uninstalling pip-21.0.1:
Successfully uninstalled pip-21.0.1
Successfully installed pip-20.1.1
Defaulting to user installation because normal site-packages is not writeable
Collecting clearml-agent
Downloading clearml_agent-1.0.0-py3-none-any.whl (348 kB)
[K |################################| 348 kB 6.2 MB/s
[?25hRequirement already satisfied, skipping upgrade: pathlib2<2.4.0,>=2.3.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (2.3.5)
Collecting virtualenv<21,>=16
Downloading virtualenv-20.4.6-py2.py3-none-any.whl (7.2 MB)
[K |################################| 7.2 MB 8.7 MB/s
[?25hRequirement already satisfied, skipping upgrade: pyparsing<2.5.0,>=2.0.3 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (2.4.7)
Requirement already satisfied, skipping upgrade: python-dateutil<2.9.0,>=2.4.2 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (2.8.1)
Requirement already satisfied, skipping upgrade: attrs<20.4.0,>=18.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (20.3.0)
Collecting PyYAML<5.4.0,>=3.12
Downloading PyYAML-5.3.1.tar.gz (269 kB)
[K |################################| 269 kB 9.3 MB/s
[?25hRequirement already satisfied, skipping upgrade: requests<2.26.0,>=2.20.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (2.25.1)
Collecting typing<3.8.0,>=3.6.4
Downloading typing-3.7.4.3.tar.gz (78 kB)
[K |################################| 78 kB 11.5 MB/s
1620664964039 master-node:gpu0 DEBUG [?25hRequirement already satisfied, skipping upgrade: pyjwt<2.1.0,>=1.6.4 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (2.0.1)
Requirement already satisfied, skipping upgrade: furl<2.2.0,>=2.0.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (2.1.0)
Requirement already satisfied, skipping upgrade: jsonschema<3.3.0,>=2.6.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (3.2.0)
Collecting pyhocon<0.4.0,>=0.3.38
Downloading pyhocon-0.3.57.tar.gz (110 kB)
[K |################################| 110 kB 16.1 MB/s
[?25hRequirement already satisfied, skipping upgrade: six<1.16.0,>=1.11.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (1.15.0)
Requirement already satisfied, skipping upgrade: urllib3<1.27.0,>=1.21.1 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (1.26.4)
Requirement already satisfied, skipping upgrade: psutil<5.9.0,>=3.4.2 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (5.8.0)
Requirement already satisfied, skipping upgrade: future<0.19.0,>=0.16.0 in /home/appuser/.local/lib/python3.6/site-packages (from clearml-agent) (0.18.2)
Requirement already satisfied, skipping upgrade: importlib-metadata>=0.12; python_version < "3.8" in /home/appuser/.local/lib/python3.6/site-packages (from virtualenv<21,>=16->clearml-agent) (3.10.0)
Collecting distlib<1,>=0.3.1
Downloading distlib-0.3.1-py2.py3-none-any.whl (335 kB)
[K |################################| 335 kB 24.4 MB/s
[?25hCollecting importlib-resources>=1.0; python_version < "3.7"
Downloading importlib_resources-5.1.2-py3-none-any.whl (25 kB)
Collecting appdirs<2,>=1.4.3
Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting filelock<4,>=3.0.0
Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /home/appuser/.local/lib/python3.6/site-packages (from requests<2.26.0,>=2.20.0->clearml-agent) (2.10)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /home/appuser/.local/lib/python3.6/site-packages (from requests<2.26.0,>=2.20.0->clearml-agent) (2020.12.5)
Requirement already satisfied, skipping upgrade: chardet<5,>=3.0.2 in /home/appuser/.local/lib/python3.6/site-packages (from requests<2.26.0,>=2.20.0->clearml-agent) (4.0.0)
Requirement already satisfied, skipping upgrade: orderedmultidict>=1.0.1 in /home/appuser/.local/lib/python3.6/site-packages (from furl<2.2.0,>=2.0.0->clearml-agent) (1.0.1)
Requirement already satisfied, skipping upgrade: pyrsistent>=0.14.0 in /home/appuser/.local/lib/python3.6/site-packages (from jsonschema<3.3.0,>=2.6.0->clearml-agent) (0.17.3)
Requirement already satisfied, skipping upgrade: setuptools in /home/appuser/.local/lib/python3.6/site-packages (from jsonschema<3.3.0,>=2.6.0->clearml-agent) (54.2.0)
Requirement already satisfied, skipping upgrade: typing-extensions>=3.6.4; python_version < "3.8" in /home/appuser/.local/lib/python3.6/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->virtualenv<21,>=16->clearml-agent) (3.7.4.3)
Requirement already satisfied, skipping upgrade: zipp>=0.5 in /home/appuser/.local/lib/python3.6/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->virtualenv<21,>=16->clearml-agent) (3.4.1)
Building wheels for collected packages: PyYAML, typing, pyhocon
Building wheel for PyYAML (setup.py) ... [?25l- \ done
[?25h Created wheel for PyYAML: filename=PyYAML-5.3.1-cp36-cp36m-linux_x86_64.whl size=44621 sha256=f17ad615267d90082132b4ac0197674671ab924af9b1e1e28a08eab17a81e79c
Stored in directory: /home/appuser/.cache/pip/wheels/e5/9d/ad/2ee53cf262cba1ffd8afe1487eef788ea3f260b7e6232a80fc
Building wheel for typing (setup.py) ... [?25l- \ done
[?25h Created wheel for typing: filename=typing-3.7.4.3-py3-none-any.whl size=26308 sha256=adc968fef1cf0e3c3915e310d365c7a46c1d8a4f1a894f0a32140f934733da2b
Stored in directory: /home/appuser/.cache/pip/wheels/5f/63/c2/b85489bbea28cb5d36cfe197244f898428004fa3caa7a23116
Building wheel for pyhocon (setup.py) ... [?25l- done
[?25h Created wheel for pyhocon: filename=pyhocon-0.3.57-py3-none-any.whl size=18540 sha256=6b6c90ac444ad219853a743bb04758aabab52bde1c367b8cf09e4181bb25c2e5
Stored in directory: /home/appuser/.cache/pip/wheels/b7/7d/b4/16bc2a5f680bf122eac5278d8f20e96995eb94f5da45dcbdb0
Successfully built PyYAML typing pyhocon
Installing collected packages: distlib, importlib-resources, appdirs, filelock, virtualenv, PyYAML, typing, pyhocon, clearml-agent
Attempting uninstall: PyYAML
Found existing installation: PyYAML 5.4.1
Uninstalling PyYAML-5.4.1:
Successfully uninstalled PyYAML-5.4.1
Successfully installed PyYAML-5.3.1 appdirs-1.4.4 clearml-agent-1.0.0 distlib-0.3.1 filelock-3.0.12 importlib-resources-5.1.2 pyhocon-0.3.57 typing-3.7.4.3 virtualenv-20.4.6
[33mWARNING: You are using pip version 20.1.1; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3.6 -m pip install --upgrade pip' command.[0m
cp: failed to access '/root/default_clearml.conf': Permission denied
clearml_agent: ERROR: Could not find task id=024a421c0e174650a1c7ff64af756c26 (for host: ) `
so the clearml-agent daemon needs higher privilege?
which clearml.conf is it refering to? I'm executing on my client, which is then remotely executed by the agent. Both of them has ~/clearml.conf.