How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

Answered

How can I ensure tasks in a pipeline have the same environment as the pipeline itself? It seems a bit counter-intuitive that the pipeline (executed remotely) captures the local environment, but the tasks (executed remotely) do not use that same environment?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 42

what format should I specify it

requirements.txt format e.g. ["package >= 1.2.3"]

Would this enforce that package on various components

This is a per component control, so you can have different packages / containers based on the componnent

Would it then no longer capture import statements?

This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in other words can you send the pip freeze of th machine creating the pipeline ? because this is where the packages are detected (if packages are not installed you cannot infer the actual package name nor the version just from the import statement)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So a missing bit of information that I see I forgot to mention, is that we named our packages as foo-mod in pyproject.toml . That hyphen then get’s rewritten as foo_mod.x.y.z-distinfo .

foo-mod @ git+

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

PricklyRaven28 That would be my fallback, it would make development much slower (having to build containers with every small change)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Yes. Though again, just highlighting the naming of

foo-mod

is arbitrary. The actual module simply has a folder structured with an implicit namespace:

Yep I think this is exactly why it fails detecting it, let me check that

And it’s failing on typing hints for functions passed in

pipe.add_function_step(…, helper_function=[…])

… I guess those aren’t being removed like the wrapped function step?

Can you provide the log? I think I'm missing what exactly was added into the decorator that somehow fails the Task creation

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701083040387072:profile|UnevenDolphin73>

How can I ensure tasks in a pipeline have the same environment as the pipeline itself?
...
but the tasks (executed remotely) do not use that same environment?

Just verifying, we are talking about pipeline decorators?

We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image

You can specify the docker on the decorator itself:
None
Regrading capturing the packages, if you import them inside the decorated package, they will be captured based on what is installed in the local (i.e. initial) environment. The idea is that the components are Not the same as the logic, basically the logic of the pipeline should not have any real package requirement, only the components (actually doing something), should. What am I missing ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Then the type hints are not removed from helper and the code immediately crashes when being run

Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I have no idea what’s the difference, but it does not log the internal repository 😞 If I knew why, I would be able to solve it myself… hehe

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Pinging about this still, unresolved 🤔

ClearML does not capture our internal libraries and so our functions (pipeline steps) crash with missing modules.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Alternatively, it would be good to specify both some requirements and auto-detect 🤔

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Using the PipelineController with add_function_step

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

We’d be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would 😅

Any thoughts CostlyOstrich36 ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Show more results

Write your answer

49K Views

42 Answers

one year ago