Answered

Hey Everyone, I Am Facing A Weird Issue With A Docker Image. When I Use It As My Base_Docker, Clearml Basically "Opens A Terminal" Inside Of It And Does Not Continue Any Further. I Have Never Had This Behavior With Any Other Docker Image. With The Same Co

Hey everyone, I am facing a weird issue with a Docker image. When I use it as my base_docker, ClearML basically "opens a terminal" inside of it and does not continue any further. I have never had this behavior with any other Docker image. With the same code base, I also used "ubuntu:18.04" as my base_docker and ClearML worked as usual.

I am using None

A little bit of researching suggested that the entrypoint might be an issue - is that the case and if so, can I fix it?

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

Votes Newest

Answers 13

Try simply removing the entrypoint from the original image instead of setting it to bash- see here

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I see, that should work, thank you! I guess I was hoping to find a solution with some clearml args rather than creating a new docker image

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

Hi SpicyFrog56 , I think this is because of the entrypoint of this docker image - note the format of the docker run command used by the agent - it's basically passing a command and args, but I guess the entrypoint messes that up? You can easily check by trying a similar docker run command by yourself and checking how to container behaves

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

What is the error?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

This thing is that the agent is designed to provide you with maximum flexibility, meaning you can use a docker image that works differently and can set itself up in the entrypoint, so the agent never overrides the entrypoint - in your specific case, that's an issue 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

That worked, interesting. Thanks! Not sure if I fully understand why...:D

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

Well, the obvious solve would be to build your own docker image from that docker image (using the FROM Docerfile directive) and only overriding the Entrypoint

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

If you ask Bash to run Bash you might get some issues 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi SpicyFrog56 , can you please add the full log?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

SuccessfulKoala55 Is there any way for me to override this behavior? I don't have access to the original Dockerfile but need (aka makes my life much easier) the docker image :D

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

Hey SuccessfulKoala55 , I played with the Dockerfile a bit but can't get it working. Locally, I can access the docker image and everything runs as expected, but if I create the ClearML task, it fails, at least with a new error. The Dockerfile looks like this:

Use the base image

FROM scrin/dev-spconv:latest

ENTRYPOINT ["/bin/bash"]

Install required Python packages

RUN pip install open3d
RUN pip install --no-index torch-scatter -f None
RUN pip install strictyaml
RUN pip install clearml
RUN pip install "boto3>=1.9"

Update package information (continue even if it fails)

RUN apt-get update || true

Install required system libraries

RUN apt-get install -y libx11-6
RUN apt-get install -y libgl1-mesa-glx

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

And this is my log: 1708442371374 0aa73e67e07c info ClearML Task: overwriting (reusing) task id=3d5d4e989c7a4fbcaceed1e6c92d1d40
ClearML results page: XXXXX/projects/c2187a1a5e654360a3d565a14d0dc1b0/experiments/3d5d4e989c7a4fbcaceed1e6c92d1d40/output/log
1708442371974 0aa73e67e07c info 1
2024-02-20 10:19:31,990 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
1708442373378 0aa73e67e07c info 2024-02-20 10:19:33,378 - clearml.Task - INFO - Finished repository detection and package analysis
1708442384226 YYYYY:gpu0 INFO task 3d5d4e989c7a4fbcaceed1e6c92d1d40 pulled from 08f659b9bda740c782176dd13001ac39 by worker YYYYY:gpu0

1708442384303 YYYYY:gpu0 INFO Running Task 3d5d4e989c7a4fbcaceed1e6c92d1d40 inside docker: danielbogdoll/spconv_1_ood_lidar:latest arguments: ['-e', 'NVIDIA_DRIVER_CAPABILITIES=all']

1708442384326 YYYYY:gpu0 INFO Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'NVIDIA_DRIVER_CAPABILITIES=all', '-v', '/home/clearml-agent/.ssh/known_hosts:/root/.ssh/known_hosts', '--memory-swap=28G', '--memory=28G', '--shm-size=28G', '-e NVIDIA_DRIVER_CAPABILITIES=all', '-l', 'clearml-worker-id=YYYYY:gpu0', '-l', 'clearml-parent-worker-id=YYYYY:gpu0', '-e', 'CLEARML_WORKER_ID=YYYYY:gpu0', '-e', 'CLEARML_DOCKER_IMAGE=danielbogdoll/spconv_1_ood_lidar:latest -e NVIDIA_DRIVER_CAPABILITIES=all', '-e', 'CLEARML_TASK_ID=3d5d4e989c7a4fbcaceed1e6c92d1d40', '-v', '/tmp/.clearml_agent.o8882m7z.cfg:/tmp/clearml.conf', '-e', 'CLEARML_CONFIG_FILE=/tmp/clearml.conf', '-v', '/tmp/clearml_agent.ssh.3yistmo7:/.ssh', '-v', '/home/clearml-agent/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/clearml-agent/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/clearml-agent/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/clearml-agent/.clearml/cache:/clearml_agent_cache', '-v', '/home/clearml-agent/.clearml/vcs-cache:/root/.clearml/vcs-cache', '-v', '/home/clearml-agent/.clearml/venvs-cache:/root/.clearml/venvs-cache', '--rm', 'danielbogdoll/spconv_1_ood_lidar:latest', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; cp -Rf /.ssh -T ~/.ssh ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; [ ! -z $LOCAL_PYTHON ] || for i in {15..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update -y ; apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pippip" ; $LOCAL_PYTHON -m pip install -U clearml-agent==1.5.2rc0 ; cp /tmp/clearml.conf ~/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 3d5d4e989c7a4fbcaceed1e6c92d1d40']

1708442389368 YYYYY:gpu0 DEBUG /usr/bin/bash: /usr/bin/bash: cannot execute binary file

1708442389388 YYYYY:gpu0 DEBUG Process failed, exit code 126

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

1708426202645 4a9490578787 info ClearML Task: created new task id=406a4d3f372347faa9b7ba02bf993d47
ClearML results page: XXXXX/projects/c2187a1a5e654360a3d565a14d0dc1b0/experiments/406a4d3f372347faa9b7ba02bf993d47/output/log
1708426203801 4a9490578787 info 2024-02-20 05:50:03,801 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
2024-02-20 05:50:04,638 - clearml.Task - INFO - Finished repository detection and package analysis
1708426214556 YYYYY:gpu1 INFO task 406a4d3f372347faa9b7ba02bf993d47 pulled from 08f659b9bda740c782176dd13001ac39 by worker YYYYY:gpu1

1708426214642 YYYYY:gpu1 INFO Running Task 406a4d3f372347faa9b7ba02bf993d47 inside docker: scrin/dev-spconv:latest arguments: ['-e', 'NVIDIA_DRIVER_CAPABILITIES=all']
custom_setup_bash_script:
pip install open3d
pip install --no-index torch-scatter -f None
pip install strictyaml
sudo apt-get update
sudo apt-get install -y libx11-6
sudo apt-get install -y libgl1-mesa-glx
1708426214666 YYYYY:gpu1 INFO Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'NVIDIA_DRIVER_CAPABILITIES=all', '-v', '/home/clearml-agent/.ssh/known_hosts:/root/.ssh/known_hosts', '--memory-swap=28G', '--memory=28G', '--shm-size=28G', '-e NVIDIA_DRIVER_CAPABILITIES=all', '-l', 'clearml-worker-id=YYYYY:gpu1', '-l', 'clearml-parent-worker-id=YYYYY:gpu1', '-e', 'CLEARML_WORKER_ID=YYYYY:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=scrin/dev-spconv:latest -e NVIDIA_DRIVER_CAPABILITIES=all', '-e', 'CLEARML_TASK_ID=406a4d3f372347faa9b7ba02bf993d47', '-v', '/tmp/.clearml_agent.wpanxpf8.cfg:/tmp/clearml.conf', '-e', 'CLEARML_CONFIG_FILE=/tmp/clearml.conf', '-v', '/tmp/clearml_agent.ssh.x3n8s40k:/.ssh', '-v', '/home/clearml-agent/.clearml/apt-cache.1:/var/cache/apt/archives', '-v', '/home/clearml-agent/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/clearml-agent/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/clearml-agent/.clearml/cache:/clearml_agent_cache', '-v', '/home/clearml-agent/.clearml/vcs-cache:/root/.clearml/vcs-cache', '-v', '/home/clearml-agent/.clearml/venvs-cache:/root/.clearml/venvs-cache', '--rm', 'scrin/dev-spconv:latest', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; cp -Rf /.ssh -T ~/.ssh ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; [ ! -z $LOCAL_PYTHON ] || for i in {15..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update -y ; apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pippip" ; $LOCAL_PYTHON -m pip install -U clearml-agent==1.5.2rc0 ; pip install open3d ; pip install --no-index torch-scatter -f None ; pip install strictyaml ; sudo apt-get update ; sudo apt-get install -y libx11-6 ; sudo apt-get install -y libgl1-mesa-glx ; cp /tmp/clearml.conf ~/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 406a4d3f372347faa9b7ba02bf993d47']

1708426219710 YYYYY:gpu1 DEBUG [74G[ OK ]
]0;root@be3ea49471b7: ~ root@be3ea49471b7:~#

1708426304971 YYYYY:gpu1 ERROR User aborted: stopping task (1)

1708426305008 YYYYY:gpu1 DEBUG Process aborted by user

  				
Posted 
	one year ago

					More  		
  Report
		
					SpicyFrog56
				
					0
					 × 1

Write your answer

989 Views

13 Answers

one year ago