Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

Answered

Hi all,

Looking for some help when executing pipelines with custom Docker images.

I have a component defined and I expect its Python runtime environment to be managed by a custom docker image ( foobar ):
@PipelineDecorator.component(docker='foobar', ...)
As a result, I don’t want the Agent to parse what imports are being used / install dependencies whatsoever — assumption should be that the Python runtime environment is already handled by the Docker image.

How can I achieve this? Thanks!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

Votes Newest

Answers 13

Hi WickedStarfish97

As a result, I don’t want the Agent to parse what imports are being used / install dependencies whatsoever

Nothing to worry about here, even if the agent detects the python packages, they are installed on top of the preexisting packages inside the docker. That said if you want to over ride it, you can also pass packages=[]

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmm maybe different numpy version? ( numpy==1.22.1 maybe the Task needs a diff version) ? Can you post the Task log ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Pretty standard global install

https://gist.github.com/stevenhoelscher/0d345e26630e7d16ab76802871c39bd5

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

For anyone following along, my lesson was configuring the clearml-agent daemon with the --docker flag to instruct it to spawn tasks in containers (and using the docker arg passed through to my Pipeline component)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

Thank you! I adjusted my pipeline logic so that the component used packages=[]

Funny enough I’m running into a new issue now. Does this mean I need to configure the Agent’s runtime environment so it has the necessary dependencies to execute Pipeline script?
` # Agent Logs
Starting Task Execution:

Traceback (most recent call last):
File "/Users/developer/.clearml/venvs-builds/3/code/train_and_evaluate.py", line 1, in <module>
from clearml import Task, TaskTypes
ModuleNotFoundError: No module named 'clearml' $ head -10 ~/.clearml/venvs-builds/3/code/train_and_evaluate.py
from clearml import Task, TaskTypes
from clearml.automation.controller import PipelineDecorator

def train_and_evaluate():
_train_and_evaluate()

if name == 'main':
task = Task.init() `

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

Right, my only complaint is it appears to be using cached wheels and building them (for packages like numpy , scipy , etc) even though numpy is available in the Python runtime env

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

What’s interesting to me (as a ClearML newbie) is it’s clearly compiling that wheel using my host machine (MacOS).

Hmm kind of, and kind of not.
If you take a look at the Tasks created (regardless on how they are created,. pipeline, manually, etc.), you have a list of python packages required by the code, as they are detected at runtime (i.e. when the code was first executed, on the development machine). When creating a Pipeline controller (runner), the pipeline Tasks are just lists, and package version are listed based on the Machine running the initial pipeline (in your case Mac), the reason is so at least we have a version pf the packages (if exist) that will be working for you Yes you are correct, there should not be a connection between the runner machine and the remote machine, that said we do want to be able to specify the required packages and usually python packages are available on most OS distro. If we were not auto-detecting them, then you would have had to specify them manually, which you can also do and it will override the packages it detected. Does that make sense ?

Just threw a new file into the gist above

Not sure what I'm seeing there, but it definitely does not include the error.
If it helps you can DM me the full log (btw: all pass/secrets are automatically masked from the log, but I would double chech just in case 😉 )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Could it be these packages (i.e. numpy etc) are not installed as system packages in the docker (i.e. inside a venv, inside the docker) ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Funny enough I’m running into a new issue now.

Sorry my bad, I thought have known 😉 yes it probably should be packages=["clearml==1.1.6"]
BTW: do you have any imports inside the pipeline function itself ? if you do not, then no need to pass "packages" at all, it will just add clearml

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks, my pipeline script only takes a dependency on clearml as well as an internal library (local Python module installed into the Docker image) that provides the _train_and_evaluate function as seen above

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

If this is the case, there is nothing you need to change, just provide the docker image (no need to pass packages )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Even if you had any packages, I'm pretty sure there is nothing for you to worry about, it will just list them, and if they are preinstalled, the preinstalled will be used

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Just threw a new file into the gist above

It doesn’t look like it even gets to the point where it installs from the numpy wheel (because it errors out installing Pillow elsewhere).

What’s interesting to me (as a ClearML newbie) is it’s clearly compiling that wheel using my host machine (MacOS).

I would have expected there to be separation between the “pipeline runner” if you will and the task. I would expect the pipeline runner to only need a dependency on ClearML and for the task to be spawned as a container with numpy installed (Linux in this case)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WickedStarfish97
				
					0
					 × 1

Write your answer

915 Views

13 Answers

2 years ago

one year ago