There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)
There's no decorator, just e.g.
def helper(foo: Optional[Any] = None):
return foo
def step_one(...):
# stuff
Then the type hints are not removed from helper and the code immediately crashes when being run
Yes. Though again, just highlighting the naming of foo-mod
is arbitrary. The actual module simply has a folder structured with an implicit namespace:
foo/
mod/
__init__.py
# stuff
FWIW, for the time being I’m just setting the packages to all the packages the pipeline tasks sees with:
packages = get_installed_pkgs_detail()
packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
packages = task.data.script.requirements.get('pip', task.data.script.requirements.get('poetry')) or packages
print(f"Task requirements:\n{packages}")
tmp_requirements_file = "tmp_reqs.txt"
with open(tmp_requirements_file, "w") as f:
f.write("\n".join(packages) if isinstance(packages, list) else packages)
# ...
pipe.add_function_step(..., packages=tmp_requirements_file)
So a missing bit of information that I see I forgot to mention, is that we named our packages as foo-mod
in pyproject.toml
. That hyphen then get’s rewritten as foo_mod.x.y.z-distinfo
.
foo-mod @ git+
We’d be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would 😅
Any thoughts CostlyOstrich36 ?
Using the PipelineController with add_function_step
I think this is the main issue, is this reproducible ? How can we test that?
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
what format should I specify it
requirements.txt format e.g. ["package >= 1.2.3"]
Would this enforce that package on various components
This is a per component control, so you can have different packages / containers based on the componnent
Would it then no longer capture import statements?
This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in other words can you send the pip freeze of th machine creating the pipeline ? because this is where the packages are detected (if packages are not installed you cannot infer the actual package name nor the version just from the import statement)
not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend
… And it’s failing on typing hints for functions passed in pipe.add_function_step(…, helper_function=[…])
… I guess those aren’t being removed like the wrapped function step?