How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

Answered

How can I ensure tasks in a pipeline have the same environment as the pipeline itself? It seems a bit counter-intuitive that the pipeline (executed remotely) captures the local environment, but the tasks (executed remotely) do not use that same environment?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 42

How or why is this the issue?

The main issue is a missing requirement on the Task component, and this is why it is failing.
You can however manually specify package (and I'm assuming this will solve the issue), but it should have autodetected, no?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Pinging about this still, unresolved 🤔

ClearML does not capture our internal libraries and so our functions (pipeline steps) crash with missing modules.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

what format should I specify it

requirements.txt format e.g. ["package >= 1.2.3"]

Would this enforce that package on various components

This is a per component control, so you can have different packages / containers based on the componnent

Would it then no longer capture import statements?

This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in other words can you send the pip freeze of th machine creating the pipeline ? because this is where the packages are detected (if packages are not installed you cannot infer the actual package name nor the version just from the import statement)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I have no idea what’s the difference, but it does not log the internal repository 😞 If I knew why, I would be able to solve it myself… hehe

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

The only thing I could think of is that the output of pip freeze would be a URL?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Still; anyone? 🥹 CostlyOstrich36 AgitatedDove14

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Using the PipelineController with add_function_step

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Yes, for example.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

it does

not

include the “internal.repo” as a package dependency, so it crashes.

understood

And for the time being we have not used the decorators,

So how are you building the pipeline component ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Alternatively, it would be good to specify both some requirements and auto-detect 🤔

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Then the type hints are not removed from helper and the code immediately crashes when being run

Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So a missing bit of information that I see I forgot to mention, is that we named our packages as foo-mod in pyproject.toml . That hyphen then get’s rewritten as foo_mod.x.y.z-distinfo .

foo-mod @ git+

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

PricklyRaven28 That would be my fallback, it would make development much slower (having to build containers with every small change)

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

It is. In what format should I specify it? Would this enforce that package on various components? Would it then no longer capture import statements?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

For example:

my-repo @ git+

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I’ve tracked it down further, it seems the pigar utility does not apply any smart logic there.
The case we have is the following -

We have a monorepo, but all modules/libs share a common namespace foo ; so e.g. working on module mod , we use from foo.mod import …
This then looks for a module called foo , even though it’s just a namespace
In the dist-info requirement, it seems any hyphen, dot, etc are swapped for an underscore, so our site-packages represents this as foo_mod-x.y.z-distinfo
This ends showing the available package is foo_mod
Specifically since foo is not generated, it is assumed local and dropped 🤔

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Well the individual tasks do not seem to have the expected environment.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I think this is the main issue, is this reproducible ? How can we test that?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes. Though again, just highlighting the naming of foo-mod is arbitrary. The actual module simply has a folder structured with an implicit namespace:

foo/
  mod/
    __init__.py
    # stuff

FWIW, for the time being I’m just setting the packages to all the packages the pipeline tasks sees with:

    packages = get_installed_pkgs_detail()
    packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
    packages = task.data.script.requirements.get('pip', task.data.script.requirements.get('poetry')) or packages
    print(f"Task requirements:\n{packages}")
    tmp_requirements_file = "tmp_reqs.txt"
    with open(tmp_requirements_file, "w") as f:
        f.write("\n".join(packages) if isinstance(packages, list) else packages)
    
    # ...
    
    pipe.add_function_step(..., packages=tmp_requirements_file)

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Hi UnevenDolphin73 , when you say pipeline itself you mean the controller? The controller is only in charge of handling the components. Lets say you have a pipeline with many parts. If you have a global environment then it will force a lot of redundant installations through the pipeline. What is your use case?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

None
This example ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This then looks for a module called foo , even though it’s just a namespaceI think this is the issue, are you using python package name spaces ?
(this is a PEP feature that is really rarely used, and I have seen break too many times)
Assuming you have from from foo.mod import what are you seeing in pip freeze ? I'd like to see if we can fix this, and better support namespaces

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

There's no decorator, just e.g.

def helper(foo: Optional[Any] = None):
    return foo

def step_one(...):
    # stuff

Then the type hints are not removed from helper and the code immediately crashes when being run

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Exactly, it should have auto-detected the package.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Yes. Though again, just highlighting the naming of

foo-mod

is arbitrary. The actual module simply has a folder structured with an implicit namespace:

Yep I think this is exactly why it fails detecting it, let me check that

And it’s failing on typing hints for functions passed in

pipe.add_function_step(…, helper_function=[…])

… I guess those aren’t being removed like the wrapped function step?

Can you provide the log? I think I'm missing what exactly was added into the decorator that somehow fails the Task creation

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We have an internal mono-repo and some of the packages are required - they’re all available correctly for the controller, only some are required for the individual tasks, but the “magic” doesn’t happen 😞
That is, the controller does not identify them as a requirement, so they’re not installed in the tasks environment.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

… And it’s failing on typing hints for functions passed in pipe.add_function_step(…, helper_function=[…]) … I guess those aren’t being removed like the wrapped function step?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Show more results

Write your answer

95K Views

42 Answers

2 years ago