Hi Everyone! I Have A Question Regarding Running A Script Inside Docker Container With Clearml: I Build An Image Containing All Requirements To Run Some Python Script That Is Getting Arguments Via Argparse. When I Build A Wrapper Script That Will Run This

Answered

Hi everyone!
I have a question regarding running a script inside docker container with clearml:
I build an image containing all requirements to run some python script that is getting arguments via argparse.
When I build a wrapper script that will run this container with clearml, it fails on:
clearml_agent: ERROR: Could not install task requirements!
When trying to Pytorch and Packaging, with this exact error message:

 Found PyTorch version torch==1.13.1 matching CUDA version 0
Collecting torch==1.13.1
  ERROR: HTTP error 403 while getting


  ERROR: Could not install requirement torch==1.13.1 from

 because of error 403 Client Error: Forbidden for url:


ERROR: Could not install requirement torch==1.13.1 from

 because of HTTP error 403 Client Error: Forbidden for url:

 for URL


clearml_agent: ERROR: Could not download wheel name of "

"
Requirement already satisfied: PyYAML==6.0 in /usr/local/lib/python3.6/site-packages (from -r /tmp/cached-reqsub0jbcq6.txt (line 1)) (6.0)
Requirement already satisfied: fastnumbers==3.2.1 in /usr/local/lib/python3.6/site-packages (from -r /tmp/cached-reqsub0jbcq6.txt (line 3)) (3.2.1)
Collecting lockfile==0.12.2
  Using cached lockfile-0.12.2-py2.py3-none-any.whl (13 kB)
ERROR: Could not find a version that satisfies the requirement packaging==23.1 (from -r /tmp/cached-reqsub0jbcq6.txt (line 5)) (from versions: 14.0, 14.1, 14.2, 14.3, 14.4, 14.5, 15.0, 15.1, 15.2, 15.3, 16.0, 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 17.0, 17.1, 18.0, 19.0, 19.1, 19.2, 20.0, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6, 20.7, 20.8, 20.9, 21.0, 21.1, 21.2, 21.3)
ERROR: No matching distribution found for packaging==23.1 (from -r /tmp/cached-reqsub0jbcq6.txt (line 5))
clearml_agent: ERROR: Could not install task requirements!

I wanted to know why is it trying to install these packages? My image is fully ready to run the required script. It does not need to run on GPU anyway, so there is no need in these installations

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousMole43
				
					0
					 × 1

Votes Newest

Answers 10

I can see from the console in the UI that a part of the command it's trying to run is:
'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean'

and some more commands that I'm trying to understand why does my agent gets it. I'm going back and forth on the clearml config but everything I change doesn't seem to have any effect.

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousMole43
				
					0
					 × 1

CostlyOstrich36
You're right, I do use a custom entry point in my docker file.
So, can you please suggest if you think this would work:

Set an environment that will be able to run this task entirely (script will include Task.init).
Create a new image from which I will delete the customised run command (FYI that the Dockerfile does not contain clearml/clearml-agent installation)
Run the task from python script - will publish task to clearml UI
Clone task and use the agent command (as written in my previous message) to spin up the agent that will use the new created docker image to run it.

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousMole43
				
					0
					 × 1

It's a bit of a problem to do this, as I'm using a subprocess to run a python script in the container, and the paths in my local differ from the one inside the container.

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousMole43
				
					0
					 × 1

The agent does this automatically - it does not support running your custom entry point

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Thank you CostlyOstrich36 and SuccessfulKoala55 !
I managed to get what I wanted using your inputs!

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousMole43
				
					0
					 × 1

Well done! Out of curiosity, what did you end up doing?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Hi DangerousMole43 , how are you running the agent? By default the agent does not use pre-packaged docker images with a built-in script, the whole concept is for that agent to recreate the correct environment inside the container (hence installing the packages and cloning the code) and re-running your task there

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I would suggest structuring everything around the Task object. After you clone and enqueue the agent can handle all the required packages / environment. You can even set environment variables so it won't try to create a new env but use the existing one in the docker container.

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

DangerousMole43 , I think you're trying to do with the agent something that it wasn't intended to. As SuccessfulKoala55 mentioned the agent does not support running custom entry points. The idea is to clone tasks in the system and enqueue them where the agent is capable of creating the required environment and running the code through cloning the repo

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Hi SuccessfulKoala55 ,
First - I initiate the agent using this command:
clearml-agent daemon --queue maytar_test_q --docker docker_image --detached --cpu-only

As for the task itself - I have a bash script inside the container that executes a python script (also located inside the container), that is getting arguments via argparse (so far, no clearml involved). To initiate the task - I run a very basic python script (out of the container) that initiates a clearml task and gets the same arguments in argparse. Then, once I have the task in the clearml UI (which is completed successfully), I reset it and enqueue it with maytar_test_q . This is where it fails..

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousMole43
				
					0
					 × 1

Write your answer

1K Views

10 Answers

one year ago