I think this is the main issue, is this reproducible ? How can we test that?
what format should I specify it
requirements.txt format e.g. ["package >= 1.2.3"]
Would this enforce that package on various components
This is a per component control, so you can have different packages / containers based on the componnent
Would it then no longer capture import statements?
This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in other words can you send the pip freeze of th machine creating the pipeline ? because this is where the packages are detected (if packages are not installed you cannot infer the actual package name nor the version just from the import statement)
How or why is this the issue? I great something is getting lost in translation :D
On the local machine, we have all the packages needed. The code gets sent for remote execution, and all the local packages are frozen correctly with pip.
The pipeline controller task is then generated and executed remotely, and it has all the relevant packages.
Each component it launches, however, is missing the internal packages available earlier :(
Hi UnevenDolphin73 , when you say pipeline itself you mean the controller? The controller is only in charge of handling the components. Lets say you have a pipeline with many parts. If you have a global environment then it will force a lot of redundant installations through the pipeline. What is your use case?
is this repo installed on the machine creating the pipeline ?
You can also manually add it here `packages={"link_to_internal_python_package",]
None
Using the PipelineController with add_function_step
How or why is this the issue?
The main issue is a missing requirement on the Task component, and this is why it is failing.
You can however manually specify package (and I'm assuming this will solve the issue), but it should have autodetected, no?
not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend
The only thing I could think of is that the output of pip freeze would be a URL?
it does
not
include the “internal.repo” as a package dependency, so it crashes.
understood
And for the time being we have not used the decorators,
So how are you building the pipeline component ?
I’ve tracked it down further, it seems the pigar utility does not apply any smart logic there.
The case we have is the following -
- We have a monorepo, but all modules/libs share a common namespace
foo
; so e.g. working on modulemod
, we usefrom foo.mod import …
- This then looks for a module called
foo
, even though it’s just a namespace - In the dist-info requirement, it seems any hyphen, dot, etc are swapped for an underscore, so our site-packages represents this as
foo_mod-x.y.z-distinfo
- This ends showing the available package is
foo_mod
- Specifically since
foo
is not generated, it is assumed local and dropped 🤔