How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

Answered

How can I ensure tasks in a pipeline have the same environment as the pipeline itself? It seems a bit counter-intuitive that the pipeline (executed remotely) captures the local environment, but the tasks (executed remotely) do not use that same environment?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 42

For example:

my-repo @ git+

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I think this is the main issue, is this reproducible ? How can we test that?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

If you use this one for example, will the component have pandas as part of the requirement
None

def step_two(...):
    import pandas as pd
    # do stuff

If so (and it should), what's the difference, where is "internal.repo " different from pandas ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Alternatively, it would be good to specify both some requirements and auto-detect 🤔

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

It is. In what format should I specify it? Would this enforce that package on various components? Would it then no longer capture import statements?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

… And it’s failing on typing hints for functions passed in pipe.add_function_step(…, helper_function=[…]) … I guess those aren’t being removed like the wrapped function step?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Still; anyone? 🥹 @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Well the individual tasks do not seem to have the expected environment.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Yes, for example.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

is this repo installed on the machine creating the pipeline ?
You can also manually add it here `packages={"link_to_internal_python_package",]
None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We can change the project name’s of course, if there’s a suggestion/guide that will make them see past the namespace…

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

It is installed on the pipeline creating the machine.
I have no idea why it did not automatically detect it 😞

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Exactly, it should have auto-detected the package.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

This then looks for a module called foo , even though it’s just a namespaceI think this is the issue, are you using python package name spaces ?
(this is a PEP feature that is really rarely used, and I have seen break too many times)
Assuming you have from from foo.mod import what are you seeing in pip freeze ? I'd like to see if we can fix this, and better support namespaces

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

None
This example ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

And is this repo installed on the pipeline creating machine ?
Basically I'm asking how come it did not automatically detect it?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The only thing I could think of is that the output of pip freeze would be a URL?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Yes. Though again, just highlighting the naming of foo-mod is arbitrary. The actual module simply has a folder structured with an implicit namespace:

foo/
  mod/
    __init__.py
    # stuff

FWIW, for the time being I’m just setting the packages to all the packages the pipeline tasks sees with:

    packages = get_installed_pkgs_detail()
    packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
    packages = task.data.script.requirements.get('pip', task.data.script.requirements.get('poetry')) or packages
    print(f"Task requirements:\n{packages}")
    tmp_requirements_file = "tmp_reqs.txt"
    with open(tmp_requirements_file, "w") as f:
        f.write("\n".join(packages) if isinstance(packages, list) else packages)
    
    # ...
    
    pipe.add_function_step(..., packages=tmp_requirements_file)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

There's no decorator, just e.g.

def helper(foo: Optional[Any] = None):
    return foo

def step_one(...):
    # stuff

Then the type hints are not removed from helper and the code immediately crashes when being run

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Pinging about this still, unresolved 🤔

ClearML does not capture our internal libraries and so our functions (pipeline steps) crash with missing modules.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I have no idea what’s the difference, but it does not log the internal repository 😞 If I knew why, I would be able to solve it myself… hehe

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I’d like to refrain from manually specifying the dependencies, since it adds a lot of overhead to extend

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

So from foo.mod import "translates" to foo-mod @ git+ None .. ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Using the PipelineController with add_function_step

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Hey @<1523701205467926528:profile|AgitatedDove14> , thanks for the reply!

We would like to avoid dockerizing all our repositories. And for the time being we have not used the decorators, but we can do that too.
The pipeline is instead built dynamically at the moment.

The issue is that the components do not have their dependency. For example:

def step_one(...):
    from internal.repo import private
    # do stuff

When step_one is added as a component to the pipeline, it does not include the “internal.repo” as a package dependency, so it crashes.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Hi UnevenDolphin73 , when you say pipeline itself you mean the controller? The controller is only in charge of handling the components. Lets say you have a pipeline with many parts. If you have a global environment then it will force a lot of redundant installations through the pipeline. What is your use case?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

We’d be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would 😅

Any thoughts CostlyOstrich36 ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Show more results

Write your answer

49K Views

42 Answers

one year ago