Hii Everyone! I'M Having An Issue Using An Agent Without A Gpu. I'M Using It On Docker Mode (To Allow Ssh), I Changed The Default Docker Image On The Config File To Python 3.9.6 But It Seems It Is Still Trying To Use The Nvidia Image. The Error Message G

Answered

Hii everyone!

I'm having an issue using an agent without a GPU. I'm using it on docker mode (to allow SSH), I changed the default docker image on the config file to python 3.9.6 but it seems it is still trying to use the nvidia image.
The error message goes as follows:
task f3681e0b3ce8405a9c00e80dc4ca064b pulled from 78a774d78008487bae4ea3b9944e9ff8 by worker free-e2-micro-1:0 2022-12-15 10:11:15 Running Task f3681e0b3ce8405a9c00e80dc4ca064b inside docker: nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04 arguments: [] 2022-12-15 10:11:15 Executing: ['docker', 'run', '-t', '--gpus', 'all', '-v', '/clearml-agent-free-e2-micro-1:/clearml-agent-free-e2-micro-1', '-e', 'SSH_AUTH_SOCK=/clearml-agent-free-e2-micro-1', '-l', 'clearml-worker-id=free-e2-micro-1:0', '-l', 'clearml-parent-worker-id=free-e2-micro-1:0', '-e', 'CLEARML_WORKER_ID=free-e2-micro-1:0', '-e', 'CLEARML_DOCKER_IMAGE=nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04', '-e', 'CLEARML_TASK_ID=f3681e0b3ce8405a9c00e80dc4ca064b', '-v', '/root/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.vquygxsh.cfg:/tmp/clearml.conf', '-e', 'CLEARML_CONFIG_FILE=/tmp/clearml.conf', '-v', '/root/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/root/.clearml/pip-cache:/root/.cache/pip', '-v', '/root/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/root/.clearml/cache:/clearml_agent_cache', '-v', '/root/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04', 'bash', '-c', 'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; [ ! -z $LOCAL_PYTHON ] || for i in {15..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update -y ; apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /tmp/clearml.conf ~/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id f3681e0b3ce8405a9c00e80dc4ca064b'] 2022-12-15 10:11:20 docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. 2022-12-15 10:11:20 Process failed, exit code 125I am getting this error when running with the default docker image that I specified, and when I try --docker python:3.9 . In both cases, when the agent is up, it says it is running with the correct image.

  				
Posted 
	2 years ago

					More  		
  Report
		
					GrotesqueOctopus42
				
					0
					 × 1

Votes Newest

Answers 3

Hi GrotesqueOctopus42 ,

BTW: is it better to post the long error message on a reply to avoid polluting the channel?

Yes, that is appreciated 🙂
Basically logs in the thread of the initial message.

To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)

Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Managed to fix it. I cloned again the task and it pulled the correct docker image. However it still tried to use GPU. To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)

  				
Posted 
	2 years ago

					More  		
  Report
		
					GrotesqueOctopus42
				
					0
					 × 1

BTW: is it better to post the long error message on a reply to avoid polluting the channel?

  				
Posted 
	2 years ago

					More  		
  Report
		
					GrotesqueOctopus42
				
					0
					 × 1

Write your answer

2K Views

3 Answers

2 years ago