Are you running within a zero-trust environment like ZScaler ?
Feels like your issue is not ClearML itself, but issue with https/SSL and certificate from your zero-trust system
This is the last situation
I could create a docker that works
FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
git \
curl\
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Update CA certificates
COPY hme_root_CA.crt /usr/local/share/ca-certificates/hme_root_CA.crt
RUN update-ca-certificates
ENV SSL_CERT_FILE=/usr/local/share/ca-certificates/hme_root_CA.crt
ENV REQUESTS_CA_BUNDLE=/usr/local/share/ca-certificates/hme_root_CA.crt
ENV PIP_CONFIG_FILE=/root/conda.conf
ENV CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
ENV CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true
RUN pip config set global.cert /usr/local/share/ca-certificates/hme_root_CA.crt
RUN pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org --upgrade pip
RUN conda config --set ssl_verify /usr/local/share/ca-certificates/hme_root_CA.crt
COPY conda.conf /root/conda.conf
# Configure ClearML
COPY clearml.conf /root/clearml.conf
RUN clearml-init
# Start ClearML worker
CMD [ "clearml-agent", "daemon", "--queue", "default", "--foreground" ]
But it just works if I use clearml-agent daemon --gpus 0 --queue default --foreground
If I try to use clearml-agent daemon --gpus 0 --queue default --foreground --docker (in the host)
When It opens the docker mentioned above, it always tries to install different apt packages, and tries to install some conda packages and gets stall
Do you know if is possible to avoid all the installations? And just run everything in the docker's environment?
You are using CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL the wrong way
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL need to be a path
while the other may need to be 1
instead of true
@<1576381444509405184:profile|ManiacalLizard2>
I changhed that
But it continues trying to install the next packages:
The following additional packages will be installed:
binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp cpp-9
dpkg-dev fakeroot file g++ g++-9 gcc gcc-9 gcc-9-base git-man krb5-locales
less libalgorithm-diff-perl libalgorithm-diff-xs-perl
libalgorithm-merge-perl libasan5 libatomic1 libbinutils libbrotli1 libbsd0
libc-dev-bin libc6 libc6-dev libcbor0.6 libcc1-0 libcrypt-dev libctf-nobfd0
libctf0 libcurl3-gnutls libdpkg-perl libedit2 liberror-perl libexpat1
libexpat1-dev libfakeroot libfido2-1 libfile-fcntllock-perl libgcc-9-dev
libgdbm-compat4 libgdbm6 libgomp1 libgssapi-krb5-2 libisl22 libitm1
libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 liblocale-gettext-perl
liblsan0 libmagic-mgc libmagic1 libmpc3 libmpdec2 libmpfr6 libnghttp2-14
libperl5.30 libpsl5 libpython3-dev libpython3-stdlib libpython3.8
libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib libquadmath0
librtmp1 libssh-4 libstdc++-9-dev libtsan0 libubsan1 libx11-6 libx11-data
libxau6 libxcb1 libxdmcp6 libxext6 libxmuu1 linux-libc-dev make manpages
manpages-dev mime-support netbase openssh-client patch perl perl-base
perl-modules-5.30 publicsuffix python-pip-whl python3 python3-dev
python3-distutils python3-lib2to3 python3-minimal python3-pkg-resources
python3-setuptools python3-wheel python3.8 python3.8-dev python3.8-minimal
xauth xz-utils zlib1g-dev
(this installation has some errors or warnings in the middle)
Then it tries to install with pip, but it gets stall after it finish and doesn't continue any more
Using cached zipp-3.20.1-py3-none-any.whl (9.0 kB)
Installing collected packages: attrs, six, orderedmultidict, furl, urllib3, rpds-py, referencing, zipp, importlib-resources, jsonschema-specifications, pkgutil-resolve-name, jsonschema, python-dateutil, psutil, PyYAML, pyjwt, pyparsing, certifi, charset-normalizer, idna, requests, filelock, platformdirs, distlib, virtualenv, pathlib2, clearml-agent
Using the next docker file for the worker
FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
git \
curl\
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Update CA certificates
COPY hme_root_CA.crt /usr/local/share/ca-certificates/hme_root_CA.crt
RUN update-ca-certificates
ENV SSL_CERT_FILE=/usr/local/share/ca-certificates/hme_root_CA.crt
ENV REQUESTS_CA_BUNDLE=/usr/local/share/ca-certificates/hme_root_CA.crt
ENV PIP_CONFIG_FILE=/root/conda.conf
ENV CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
RUN pip config set global.cert /usr/local/share/ca-certificates/hme_root_CA.crt
RUN pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org --upgrade pip
RUN conda config --set ssl_verify /usr/local/share/ca-certificates/hme_root_CA.crt
COPY conda.conf /root/conda.conf
# Configure ClearML
COPY clearml.conf /root/clearml.conf
RUN clearml-init
# Start ClearML worker
CMD [ "clearml-agent", "daemon", "--queue", "default", "--foreground" ]
It could be something related with conda in the docker?
Hi @<1576381444509405184:profile|ManiacalLizard2> at the end we make it works
It has a lot of warning but is able to run the experiments hahaha
Thanks for your help