It is installed on the pipeline creating the machine.
I have no idea why it did not automatically detect it š
So from foo.mod import
"translates" to foo-mod @ git+
None ..
?
It is. In what format should I specify it? Would this enforce that package on various components? Would it then no longer capture import statements?
If you use this one for example, will the component have pandas as part of the requirement
None
def step_two(...):
import pandas as pd
# do stuff
If so (and it should), what's the difference, where is "internal.repo " different from pandas ?
How or why is this the issue?
The main issue is a missing requirement on the Task component, and this is why it is failing.
You can however manually specify package (and I'm assuming this will solve the issue), but it should have autodetected, no?
We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image
That way you donāt rely on clearml capturing the local env, and you can control what exists in the env
ā¦ And itās failing on typing hints for functions passed in pipe.add_function_step(ā¦, helper_function=[ā¦])
ā¦ I guess those arenāt being removed like the wrapped function step?
I have no idea whatās the difference, but it does not log the internal repository š If I knew why, I would be able to solve it myselfā¦ hehe
Yes. Though again, just highlighting the naming of foo-mod
is arbitrary. The actual module simply has a folder structured with an implicit namespace:
foo/
mod/
__init__.py
# stuff
FWIW, for the time being Iām just setting the packages to all the packages the pipeline tasks sees with:
packages = get_installed_pkgs_detail()
packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
packages = task.data.script.requirements.get('pip', task.data.script.requirements.get('poetry')) or packages
print(f"Task requirements:\n{packages}")
tmp_requirements_file = "tmp_reqs.txt"
with open(tmp_requirements_file, "w") as f:
f.write("\n".join(packages) if isinstance(packages, list) else packages)
# ...
pipe.add_function_step(..., packages=tmp_requirements_file)
Iād like to refrain from manually specifying the dependencies, since it adds a lot of overhead to extend
Still; anyone? š„¹ @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>
is this repo installed on the machine creating the pipeline ?
You can also manually add it here `packages={"link_to_internal_python_package",]
None
We have an internal mono-repo and some of the packages are required - theyāre all available correctly for the controller, only some are required for the individual tasks, but the āmagicā doesnāt happen š
That is, the controller does not identify them as a requirement, so theyāre not installed in the tasks environment.
There's no decorator, just e.g.
def helper(foo: Optional[Any] = None):
return foo
def step_one(...):
# stuff
Then the type hints are not removed from helper and the code immediately crashes when being run
not sure about this, we really like being in control of reproducibility and not depend on the invoking machineā¦ maybe thatās not what you intend
Hi UnevenDolphin73 , when you say pipeline itself you mean the controller? The controller is only in charge of handling the components. Lets say you have a pipeline with many parts. If you have a global environment then it will force a lot of redundant installations through the pipeline. What is your use case?
Yes. Though again, just highlighting the naming of
foo-mod
is arbitrary. The actual module simply has a folder structured with an implicit namespace:
Yep I think this is exactly why it fails detecting it, let me check that
And itās failing on typing hints for functions passed in
pipe.add_function_step(ā¦, helper_function=[ā¦])
ā¦ I guess those arenāt being removed like the wrapped function step?
Can you provide the log? I think I'm missing what exactly was added into the decorator that somehow fails the Task creation
what format should I specify it
requirements.txt format e.g. ["package >= 1.2.3"]
Would this enforce that package on various components
This is a per component control, so you can have different packages / containers based on the componnent
Would it then no longer capture import statements?
This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in other words can you send the pip freeze of th machine creating the pipeline ? because this is where the packages are detected (if packages are not installed you cannot infer the actual package name nor the version just from the import statement)
There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)
Iāve tracked it down further, it seems the pigar utility does not apply any smart logic there.
The case we have is the following -
- We have a monorepo, but all modules/libs share a common namespace
foo
; so e.g. working on modulemod
, we usefrom foo.mod import ā¦
- This then looks for a module called
foo
, even though itās just a namespace - In the dist-info requirement, it seems any hyphen, dot, etc are swapped for an underscore, so our site-packages represents this as
foo_mod-x.y.z-distinfo
- This ends showing the available package is
foo_mod
- Specifically since
foo
is not generated, it is assumed local and dropped š¤
Hey @<1523701205467926528:profile|AgitatedDove14> , thanks for the reply!
We would like to avoid dockerizing all our repositories. And for the time being we have not used the decorators, but we can do that too.
The pipeline is instead built dynamically at the moment.
The issue is that the components do not have their dependency. For example:
def step_one(...):
from internal.repo import private
# do stuff
When step_one
is added as a component to the pipeline, it does not include the āinternal.repoā as a package dependency, so it crashes.
Pinging about this still, unresolved š¤
ClearML does not capture our internal libraries and so our functions (pipeline steps) crash with missing modules.
How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(
Using the PipelineController with add_function_step
it does
not
include the āinternal.repoā as a package dependency, so it crashes.
understood
And for the time being we have not used the decorators,
So how are you building the pipeline component ?
- This then looks for a module called
foo
, even though itās just a namespaceI think this is the issue, are you using python package name spaces ?
(this is a PEP feature that is really rarely used, and I have seen break too many times)
Assuming you have fromfrom foo.mod import
what are you seeing in pip freeze ? I'd like to see if we can fix this, and better support namespaces
Itās just that for the packages
argument, ClearML says:
If not provided, packages are automatically added based on the imports used inside the wrapped function.
Soā¦ š¤
Weād be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would š
Any thoughts CostlyOstrich36 ?
We can change the project nameās of course, if thereās a suggestion/guide that will make them see past the namespaceā¦