Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

How can I ensure tasks in a pipeline have the same environment as the pipeline itself? It seems a bit counter-intuitive that the pipeline (executed remotely) captures the local environment, but the tasks (executed remotely) do not use that same environment?

  
  
Posted 2 years ago
Votes Newest

Answers 42


Alternatively, it would be good to specify both some requirements and auto-detect šŸ¤”

  
  
Posted 2 years ago

Exactly, it should have auto-detected the package.

  
  
Posted 2 years ago

Hi UnevenDolphin73 , when you say pipeline itself you mean the controller? The controller is only in charge of handling the components. Lets say you have a pipeline with many parts. If you have a global environment then it will force a lot of redundant installations through the pipeline. What is your use case?

  
  
Posted 2 years ago

I have no idea what’s the difference, but it does not log the internal repository šŸ˜ž If I knew why, I would be able to solve it myself… hehe

  
  
Posted 2 years ago

Then the type hints are not removed from helper and the code immediately crashes when being run

Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved

  
  
Posted 2 years ago

For example:

my-repo @ git+
  
  
Posted 2 years ago

Yes, for example.

  
  
Posted 2 years ago

Hey @<1523701205467926528:profile|AgitatedDove14> , thanks for the reply!

We would like to avoid dockerizing all our repositories. And for the time being we have not used the decorators, but we can do that too.
The pipeline is instead built dynamically at the moment.

The issue is that the components do not have their dependency. For example:

def step_one(...):
    from internal.repo import private
    # do stuff

When step_one is added as a component to the pipeline, it does not include the ā€œinternal.repoā€ as a package dependency, so it crashes.

  
  
Posted 2 years ago

How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(

  
  
Posted 2 years ago

is this repo installed on the machine creating the pipeline ?
You can also manually add it here `packages={"link_to_internal_python_package",]
None

  
  
Posted 2 years ago

If you use this one for example, will the component have pandas as part of the requirement
None

def step_two(...):
    import pandas as pd
    # do stuff

If so (and it should), what's the difference, where is "internal.repo " different from pandas ?

  
  
Posted 2 years ago

PricklyRaven28 That would be my fallback, it would make development much slower (having to build containers with every small change)

  
  
Posted 2 years ago

How or why is this the issue?

The main issue is a missing requirement on the Task component, and this is why it is failing.
You can however manually specify package (and I'm assuming this will solve the issue), but it should have autodetected, no?

  
  
Posted 2 years ago

None
This example ?

  
  
Posted 2 years ago

We have an internal mono-repo and some of the packages are required - they’re all available correctly for the controller, only some are required for the individual tasks, but the ā€œmagicā€ doesn’t happen šŸ˜ž
That is, the controller does not identify them as a requirement, so they’re not installed in the tasks environment.

  
  
Posted 2 years ago

It’s just that for the packages argument, ClearML says:

If not provided, packages are automatically added based on the imports used inside the wrapped function.

So… šŸ¤”

  
  
Posted 2 years ago

I think this is the main issue, is this reproducible ? How can we test that?

  
  
Posted 2 years ago

… And it’s failing on typing hints for functions passed in pipe.add_function_step(…, helper_function=[…]) … I guess those aren’t being removed like the wrapped function step?

  
  
Posted 2 years ago

We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image

That way you don’t rely on clearml capturing the local env, and you can control what exists in the env

  
  
Posted 2 years ago

So a missing bit of information that I see I forgot to mention, is that we named our packages as foo-mod in pyproject.toml . That hyphen then get’s rewritten as foo_mod.x.y.z-distinfo .

foo-mod @ git+
  
  
Posted 2 years ago

It is. In what format should I specify it? Would this enforce that package on various components? Would it then no longer capture import statements?

  
  
Posted 2 years ago

And is this repo installed on the pipeline creating machine ?
Basically I'm asking how come it did not automatically detect it?

  
  
Posted 2 years ago

We’d be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would šŸ˜…

Any thoughts CostlyOstrich36 ?

  
  
Posted 2 years ago

Still; anyone? 🄹 @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>

  
  
Posted 2 years ago

We can change the project name’s of course, if there’s a suggestion/guide that will make them see past the namespace…

  
  
Posted 2 years ago

Using the PipelineController with add_function_step

  
  
Posted 2 years ago

Well the individual tasks do not seem to have the expected environment.

  
  
Posted 2 years ago

There's no decorator, just e.g.

def helper(foo: Optional[Any] = None):
    return foo

def step_one(...):
    # stuff

Then the type hints are not removed from helper and the code immediately crashes when being run

  
  
Posted 2 years ago

There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)

  
  
Posted 2 years ago

not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend

  
  
Posted 2 years ago