Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

How can I ensure tasks in a pipeline have the same environment as the pipeline itself? It seems a bit counter-intuitive that the pipeline (executed remotely) captures the local environment, but the tasks (executed remotely) do not use that same environment?

  
  
Posted one year ago
Votes Newest

Answers 42


Well the individual tasks do not seem to have the expected environment.

  
  
Posted one year ago

is this repo installed on the machine creating the pipeline ?
You can also manually add it here `packages={"link_to_internal_python_package",]
None

  
  
Posted one year ago

We can change the project name’s of course, if there’s a suggestion/guide that will make them see past the namespace…

  
  
Posted one year ago

There's no decorator, just e.g.

def helper(foo: Optional[Any] = None):
    return foo

def step_one(...):
    # stuff

Then the type hints are not removed from helper and the code immediately crashes when being run

  
  
Posted one year ago

Still; anyone? 🥹 @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>

  
  
Posted one year ago

There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)

  
  
Posted one year ago

I think this is the main issue, is this reproducible ? How can we test that?

  
  
Posted one year ago

Hi @<1523701083040387072:profile|UnevenDolphin73>

How can I ensure tasks in a pipeline have the same environment as the pipeline itself?
...
but the tasks (executed remotely) do not use that same environment?

Just verifying, we are talking about pipeline decorators?

We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image

You can specify the docker on the decorator itself:
None
Regrading capturing the packages, if you import them inside the decorated package, they will be captured based on what is installed in the local (i.e. initial) environment. The idea is that the components are Not the same as the logic, basically the logic of the pipeline should not have any real package requirement, only the components (actually doing something), should. What am I missing ?

  
  
Posted one year ago

The only thing I could think of is that the output of pip freeze would be a URL?

  
  
Posted one year ago

I’ve tracked it down further, it seems the pigar utility does not apply any smart logic there.
The case we have is the following -

  • We have a monorepo, but all modules/libs share a common namespace foo ; so e.g. working on module mod , we use from foo.mod import …
  • This then looks for a module called foo , even though it’s just a namespace
  • In the dist-info requirement, it seems any hyphen, dot, etc are swapped for an underscore, so our site-packages represents this as foo_mod-x.y.z-distinfo
  • This ends showing the available package is foo_mod
  • Specifically since foo is not generated, it is assumed local and dropped 🤔
  
  
Posted one year ago

And is this repo installed on the pipeline creating machine ?
Basically I'm asking how come it did not automatically detect it?

  
  
Posted one year ago

It’s just that for the packages argument, ClearML says:

If not provided, packages are automatically added based on the imports used inside the wrapped function.

So… 🤔

  
  
Posted one year ago

Alternatively, it would be good to specify both some requirements and auto-detect 🤔

  
  
Posted one year ago

We’d be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would 😅

Any thoughts CostlyOstrich36 ?

  
  
Posted one year ago

… And it’s failing on typing hints for functions passed in pipe.add_function_step(…, helper_function=[…]) … I guess those aren’t being removed like the wrapped function step?

  
  
Posted one year ago

Yes, for example.

  
  
Posted one year ago

  • This then looks for a module called foo , even though it’s just a namespaceI think this is the issue, are you using python package name spaces ?
    (this is a PEP feature that is really rarely used, and I have seen break too many times)
    Assuming you have from from foo.mod import what are you seeing in pip freeze ? I'd like to see if we can fix this, and better support namespaces
  
  
Posted one year ago

It is installed on the pipeline creating the machine.
I have no idea why it did not automatically detect it 😞

  
  
Posted one year ago

If you use this one for example, will the component have pandas as part of the requirement
None

def step_two(...):
    import pandas as pd
    # do stuff

If so (and it should), what's the difference, where is "internal.repo " different from pandas ?

  
  
Posted one year ago

We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image

That way you don’t rely on clearml capturing the local env, and you can control what exists in the env

  
  
Posted one year ago

not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend

  
  
Posted one year ago

Yes. Though again, just highlighting the naming of

foo-mod

is arbitrary. The actual module simply has a folder structured with an implicit namespace:

Yep I think this is exactly why it fails detecting it, let me check that

And it’s failing on typing hints for functions passed in

pipe.add_function_step(…, helper_function=[…])

… I guess those aren’t being removed like the wrapped function step?

Can you provide the log? I think I'm missing what exactly was added into the decorator that somehow fails the Task creation

  
  
Posted one year ago

None
This example ?

  
  
Posted one year ago

I have no idea what’s the difference, but it does not log the internal repository 😞 If I knew why, I would be able to solve it myself… hehe

  
  
Posted one year ago

it does

not

include the “internal.repo” as a package dependency, so it crashes.

understood

And for the time being we have not used the decorators,

So how are you building the pipeline component ?

  
  
Posted one year ago

Using the PipelineController with add_function_step

  
  
Posted one year ago

For example:

my-repo @ git+
  
  
Posted one year ago

How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(

  
  
Posted one year ago

Exactly, it should have auto-detected the package.

  
  
Posted one year ago

Hey @<1523701205467926528:profile|AgitatedDove14> , thanks for the reply!

We would like to avoid dockerizing all our repositories. And for the time being we have not used the decorators, but we can do that too.
The pipeline is instead built dynamically at the moment.

The issue is that the components do not have their dependency. For example:

def step_one(...):
    from internal.repo import private
    # do stuff

When step_one is added as a component to the pipeline, it does not include the “internal.repo” as a package dependency, so it crashes.

  
  
Posted one year ago