Hi FierceHamster54 , can you please elaborate on the process with a more specific example?
I can test it empirically but I want to be sure what is the expected behavior so my pipeline don't get auto-magically broken after a patch
Well given a file architecture looking like this:|_ __init__.py |_ my_pipeline.py |_ my_utils.py
With the content of my_pipeline.py
being:
` from clearml.automation.controller import PipelineDecorator
from clearml import Task, TaskTypes
from my_utils import do_thing
Task.force_store_standalone_script()
@PipelineDecorator.component(...)
def my_component(dataset_id: str):
import pandas as pd
from clearml import Dataset
dataset = Dataset.get(dataset_id=input_dataset_id)
dataset_path = dataset.get_local_copy()
dataset = do_thing(dataset)
... `
And the content of my_utils.py
being:def do_thing(df: pd.DataFrame) -> pd.DataFrame: """Just a simple shuffle this is an example not a CS course""" df = df.sample(frac=1) return df
Should I do an import pandas as pd
in my_utils.py
given than the call of do_thing()
is done within my component and thus in the scope of the component's pandas import ? Will clearML resolve that function, upload it and propagate the component from which it is called depdendencies to it?
Okay looks like the call dependency resolver does not supports cross-file calls and relies instead on the local repo cloning feature to handle multiple files so the Task.force_store_standalone_script()
does not allow for a pipeline defined cross multiple files (now that you think of it it was kinda implied by the name), but what is interesting is that calling an auxiliary function in the SAME file from a component also raise a NameError: <function_name> is not defined
, that's kinda sad.
Hi FierceHamster54 ,
I think
And is this compatible with the
Task.force_store_standalone_script()
option ?
is causing the issue, you are storing the entire script as a standalone without any git, so once you are trying to import other parts of the git, BTW any specific reason using it in your pipeline?
Would have been great if the CLearML resolver would just inline the code of locally defined vanilla functions and execute that inlined code under the import scope of the component from which it is called
Well it is also failing within the same file if you read until the end, but for the cross-file issue, it's mostly because of my repo architecture organized in a v1/v2 scheme and I didn't want to pull a lot of unused files and inject github PATs that frankly lack gralunarity in the worker