Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

How can I ensure tasks in a pipeline have the same environment as the pipeline itself? It seems a bit counter-intuitive that the pipeline (executed remotely) captures the local environment, but the tasks (executed remotely) do not use that same environment?

  
  
Posted one year ago
Votes Newest

Answers 42


We’d be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would 😅

Any thoughts CostlyOstrich36 ?

  
  
Posted one year ago

There's no decorator, just e.g.

def helper(foo: Optional[Any] = None):
    return foo

def step_one(...):
    # stuff

Then the type hints are not removed from helper and the code immediately crashes when being run

  
  
Posted one year ago

Then the type hints are not removed from helper and the code immediately crashes when being run

Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved

  
  
Posted one year ago

Yes. Though again, just highlighting the naming of foo-mod is arbitrary. The actual module simply has a folder structured with an implicit namespace:

foo/
  mod/
    __init__.py
    # stuff

FWIW, for the time being I’m just setting the packages to all the packages the pipeline tasks sees with:

    packages = get_installed_pkgs_detail()
    packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
    packages = task.data.script.requirements.get('pip', task.data.script.requirements.get('poetry')) or packages
    print(f"Task requirements:\n{packages}")
    tmp_requirements_file = "tmp_reqs.txt"
    with open(tmp_requirements_file, "w") as f:
        f.write("\n".join(packages) if isinstance(packages, list) else packages)
    
    # ...
    
    pipe.add_function_step(..., packages=tmp_requirements_file)
  
  
Posted one year ago

Yes. Though again, just highlighting the naming of

foo-mod

is arbitrary. The actual module simply has a folder structured with an implicit namespace:

Yep I think this is exactly why it fails detecting it, let me check that

And it’s failing on typing hints for functions passed in

pipe.add_function_step(…, helper_function=[…])

… I guess those aren’t being removed like the wrapped function step?

Can you provide the log? I think I'm missing what exactly was added into the decorator that somehow fails the Task creation

  
  
Posted one year ago

I think this is the main issue, is this reproducible ? How can we test that?

  
  
Posted one year ago

Hi UnevenDolphin73 , when you say pipeline itself you mean the controller? The controller is only in charge of handling the components. Lets say you have a pipeline with many parts. If you have a global environment then it will force a lot of redundant installations through the pipeline. What is your use case?

  
  
Posted one year ago

Well the individual tasks do not seem to have the expected environment.

  
  
Posted one year ago

We have an internal mono-repo and some of the packages are required - they’re all available correctly for the controller, only some are required for the individual tasks, but the “magic” doesn’t happen 😞
That is, the controller does not identify them as a requirement, so they’re not installed in the tasks environment.

  
  
Posted one year ago

We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image

That way you don’t rely on clearml capturing the local env, and you can control what exists in the env

  
  
Posted one year ago

It’s just that for the packages argument, ClearML says:

If not provided, packages are automatically added based on the imports used inside the wrapped function.

So… 🤔

  
  
Posted one year ago

Pinging about this still, unresolved 🤔

ClearML does not capture our internal libraries and so our functions (pipeline steps) crash with missing modules.

  
  
Posted one year ago