Try simply removing the entrypoint from the original image instead of setting it to bash- see here
Hey @<1523701087100473344:profile|SuccessfulKoala55> , I played with the Dockerfile a bit but can't get it working. Locally, I can access the docker image and everything runs as expected, but if I create the ClearML task, it fails, at least with a new error. The Dockerfile looks like this:
Use the base image
FROM scrin/dev-spconv:latest
ENTRYPOINT ["/bin/bash"]
Install required Python packages
RUN pip install open3d
RUN pip install --no-index torch-scatter -f None
RUN pip install strictyaml
RUN pip install clearml
RUN pip install "boto3>=1.9"
Update package information (continue even if it fails)
RUN apt-get update || true
Install required system libraries
RUN apt-get install -y libx11-6
RUN apt-get install -y libgl1-mesa-glx
This thing is that the agent is designed to provide you with maximum flexibility, meaning you can use a docker image that works differently and can set itself up in the entrypoint, so the agent never overrides the entrypoint - in your specific case, that's an issue 🙂
Hi @<1670964687132430336:profile|SpicyFrog56> , I think this is because of the entrypoint of this docker image - note the format of the docker run command used by the agent - it's basically passing a command and args, but I guess the entrypoint messes that up? You can easily check by trying a similar docker run command by yourself and checking how to container behaves
@<1523701087100473344:profile|SuccessfulKoala55> Is there any way for me to override this behavior? I don't have access to the original Dockerfile but need (aka makes my life much easier) the docker image :D
I see, that should work, thank you! I guess I was hoping to find a solution with some clearml args rather than creating a new docker image
If you ask Bash to run Bash you might get some issues 🙂
And this is my log: 1708442371374 0aa73e67e07c info ClearML Task: overwriting (reusing) task id=3d5d4e989c7a4fbcaceed1e6c92d1d40
ClearML results page: XXXXX/projects/c2187a1a5e654360a3d565a14d0dc1b0/experiments/3d5d4e989c7a4fbcaceed1e6c92d1d40/output/log
1708442371974 0aa73e67e07c info 1
2024-02-20 10:19:31,990 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
1708442373378 0aa73e67e07c info 2024-02-20 10:19:33,378 - clearml.Task - INFO - Finished repository detection and package analysis
1708442384226 YYYYY:gpu0 INFO task 3d5d4e989c7a4fbcaceed1e6c92d1d40 pulled from 08f659b9bda740c782176dd13001ac39 by worker YYYYY:gpu0
1708442384303 YYYYY:gpu0 INFO Running Task 3d5d4e989c7a4fbcaceed1e6c92d1d40 inside docker: danielbogdoll/spconv_1_ood_lidar:latest arguments: ['-e', 'NVIDIA_DRIVER_CAPABILITIES=all']
1708442384326 YYYYY:gpu0 INFO Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'NVIDIA_DRIVER_CAPABILITIES=all', '-v', '/home/clearml-agent/.ssh/known_hosts:/root/.ssh/known_hosts', '--memory-swap=28G', '--memory=28G', '--shm-size=28G', '-e NVIDIA_DRIVER_CAPABILITIES=all', '-l', 'clearml-worker-id=YYYYY:gpu0', '-l', 'clearml-parent-worker-id=YYYYY:gpu0', '-e', 'CLEARML_WORKER_ID=YYYYY:gpu0', '-e', 'CLEARML_DOCKER_IMAGE=danielbogdoll/spconv_1_ood_lidar:latest -e NVIDIA_DRIVER_CAPABILITIES=all', '-e', 'CLEARML_TASK_ID=3d5d4e989c7a4fbcaceed1e6c92d1d40', '-v', '/tmp/.clearml_agent.o8882m7z.cfg:/tmp/clearml.conf', '-e', 'CLEARML_CONFIG_FILE=/tmp/clearml.conf', '-v', '/tmp/clearml_agent.ssh.3yistmo7:/.ssh', '-v', '/home/clearml-agent/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/clearml-agent/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/clearml-agent/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/clearml-agent/.clearml/cache:/clearml_agent_cache', '-v', '/home/clearml-agent/.clearml/vcs-cache:/root/.clearml/vcs-cache', '-v', '/home/clearml-agent/.clearml/venvs-cache:/root/.clearml/venvs-cache', '--rm', 'danielbogdoll/spconv_1_ood_lidar:latest', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; cp -Rf /.ssh -T ~/.ssh ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; [ ! -z $LOCAL_PYTHON ] || for i in {15..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update -y ; apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pippip" ; $LOCAL_PYTHON -m pip install -U clearml-agent==1.5.2rc0 ; cp /tmp/clearml.conf ~/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 3d5d4e989c7a4fbcaceed1e6c92d1d40']
1708442389368 YYYYY:gpu0 DEBUG /usr/bin/bash: /usr/bin/bash: cannot execute binary file
1708442389388 YYYYY:gpu0 DEBUG Process failed, exit code 126
Hi @<1670964687132430336:profile|SpicyFrog56> , can you please add the full log?
Well, the obvious solve would be to build your own docker image from that docker image (using the FROM
Docerfile directive) and only overriding the Entrypoint
1708426202645 4a9490578787 info ClearML Task: created new task id=406a4d3f372347faa9b7ba02bf993d47
ClearML results page: XXXXX/projects/c2187a1a5e654360a3d565a14d0dc1b0/experiments/406a4d3f372347faa9b7ba02bf993d47/output/log
1708426203801 4a9490578787 info 2024-02-20 05:50:03,801 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
2024-02-20 05:50:04,638 - clearml.Task - INFO - Finished repository detection and package analysis
1708426214556 YYYYY:gpu1 INFO task 406a4d3f372347faa9b7ba02bf993d47 pulled from 08f659b9bda740c782176dd13001ac39 by worker YYYYY:gpu1
1708426214642 YYYYY:gpu1 INFO Running Task 406a4d3f372347faa9b7ba02bf993d47 inside docker: scrin/dev-spconv:latest arguments: ['-e', 'NVIDIA_DRIVER_CAPABILITIES=all']
custom_setup_bash_script:
pip install open3d
pip install --no-index torch-scatter -f None
pip install strictyaml
sudo apt-get update
sudo apt-get install -y libx11-6
sudo apt-get install -y libgl1-mesa-glx
1708426214666 YYYYY:gpu1 INFO Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'NVIDIA_DRIVER_CAPABILITIES=all', '-v', '/home/clearml-agent/.ssh/known_hosts:/root/.ssh/known_hosts', '--memory-swap=28G', '--memory=28G', '--shm-size=28G', '-e NVIDIA_DRIVER_CAPABILITIES=all', '-l', 'clearml-worker-id=YYYYY:gpu1', '-l', 'clearml-parent-worker-id=YYYYY:gpu1', '-e', 'CLEARML_WORKER_ID=YYYYY:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=scrin/dev-spconv:latest -e NVIDIA_DRIVER_CAPABILITIES=all', '-e', 'CLEARML_TASK_ID=406a4d3f372347faa9b7ba02bf993d47', '-v', '/tmp/.clearml_agent.wpanxpf8.cfg:/tmp/clearml.conf', '-e', 'CLEARML_CONFIG_FILE=/tmp/clearml.conf', '-v', '/tmp/clearml_agent.ssh.x3n8s40k:/.ssh', '-v', '/home/clearml-agent/.clearml/apt-cache.1:/var/cache/apt/archives', '-v', '/home/clearml-agent/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/clearml-agent/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/clearml-agent/.clearml/cache:/clearml_agent_cache', '-v', '/home/clearml-agent/.clearml/vcs-cache:/root/.clearml/vcs-cache', '-v', '/home/clearml-agent/.clearml/venvs-cache:/root/.clearml/venvs-cache', '--rm', 'scrin/dev-spconv:latest', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; cp -Rf /.ssh -T ~/.ssh ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; [ ! -z $LOCAL_PYTHON ] || for i in {15..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update -y ; apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pippip" ; $LOCAL_PYTHON -m pip install -U clearml-agent==1.5.2rc0 ; pip install open3d ; pip install --no-index torch-scatter -f None ; pip install strictyaml ; sudo apt-get update ; sudo apt-get install -y libx11-6 ; sudo apt-get install -y libgl1-mesa-glx ; cp /tmp/clearml.conf ~/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 406a4d3f372347faa9b7ba02bf993d47']
1708426219710 YYYYY:gpu1 DEBUG [74G[ OK ]
]0;root@be3ea49471b7: ~ root@be3ea49471b7:~#
1708426304971 YYYYY:gpu1 ERROR User aborted: stopping task (1)
1708426305008 YYYYY:gpu1 DEBUG Process aborted by user
That worked, interesting. Thanks! Not sure if I fully understand why...:D