Another question: can I mix steps with Task and Function?
According to  pipeline_from_functions.py  , it is easy to understand that step1 returns  data_frame   and I can use it as input of step2. But I have no idea what string reference could be used when steps come from Task?
code
pipe.add_function_step(
    name='step_one',
    function=step_one,
    function_kwargs=dict(pickle_data_url='${pipeline.url}'),
    function_return=['data_frame'],
    cache_executed_step=True,
)
pipe.add_function_step(
    name='step_two',
    # parents=['step_one'],  # the pipeline will automatically detect the dependencies based on the kwargs inputs
    function=step_two,
    function_kwargs=dict(data_frame='${step_one.data_frame}'),
    function_return=['processed_data'],
    cache_executed_step=True,
)
def step_one(pickle_data_url):
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object
    import sklearn  # noqa
    import pickle
    import pandas as pd
    from clearml import StorageManager
    pickle_data_url = \
        pickle_data_url or \
        '
'
    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, 'rb') as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris['data'], columns=iris['feature_names'])
    data_frame.columns += ['target']
    data_frame['target'] = iris['target']
    return data_frame
def step_two(data_frame, test_size=0.2, random_state=42):
    # make sure we have pandas for this step, we need it to use the data_frame
    import pandas as pd  # noqa
    from sklearn.model_selection import train_test_split
    y = data_frame['target']
    X = data_frame[(c for c in data_frame.columns if c != 'target')]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state)
    return X_train, X_test, y_train, y_test
		But I have no idea what will be input of step2.
What do you mean by that? the assumption is that somehow the output of step 1 will be passed (a string reference) to step 2, what am I missing ?
I have no idea what string reference could be used when steps come from Task?
Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the other Task. The assumption is the Task itself was designed with in/outs in the first place. Does that make sense ?
Are these fields of ClearML Task?
correct
can I mix steps with Task and Function?
Hmm interesting question, I think that in theory you should be able to, I have to admit that I have not tried yet, but it should work
I want to know the total fields(string reference mentioned above) which can be used as input of step2. Are these fields of ClearML Task?