The imports inside the functions are because the function itself becomes a stand-alone job running on a remote machine, not the entire pipeline code. This also automatically picks packages to be installed on the remote machine. Make sense?
Yes. I Mean this. Like You have YAML Manifest and command like in yaml: python -m clearml_pipeline/@PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing) def step_one(pickle_data_url: str, extra: int = 43): print('step_one') # make sure we have scikit-learn for this step, we need it to use to unpickle the object import sklearn # noqa import pickle import pandas as pd from clearml import StorageManager local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url) with open(local_iris_pkl, 'rb') as f: iris = pickle.load(f) data_frame = pd.DataFrame(iris['data'], columns=iris['feature_names']) data_frame.columns += ['target'] data_frame['target'] = iris['target'] return data_frame
I don't like imports inside function
In regards to the YAML how would you pass data? Like the pipeline from tasks example?
You mean to design the entire pipeline from YAML?
(this assumes your Tasks know how to process links to artifacts)
Is this what you are after?
(BTW: any reason for working with YAML files instead of coding it?)