Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
..

@<1523701205467926528:profile|AgitatedDove14> @<1523701087100473344:profile|SuccessfulKoala55> Hi, I have a question when using pipeline. I use pipeline with steps from Tasks. But I have no idea what will be input of step2. Here is an example from document.

pipe.add_step(
    name="stage_data",
    base_task_project="examples",
    base_task_name="Pipeline step 1 dataset artifact",
    parameter_override={"General/dataset_url": "${pipeline.url}"},
)

pipe.add_step(
    name="stage_process",
    parents=["stage_data"],
    base_task_project="examples",
    base_task_name="Pipeline step 2 process dataset",
    parameter_override={
        "General/dataset_url": "${stage_data.artifacts.dataset.url}",
        "General/test_size": 0.25,
    },
    pre_execute_callback=pre_execute_callback_example,
    post_execute_callback=post_execute_callback_example,
)

The example overrides the General/dataset_url parameter with the output of step1. So my question is what the output of the step1? A ClearML Task?
If it were a ClearML Task, does it mean that I can use any value in ClearML Task as an input to step2?

  
  
Posted 2 years ago
Votes Newest

Answers 10


I have no idea what string reference could be used when steps come from Task?

Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the other Task. The assumption is the Task itself was designed with in/outs in the first place. Does that make sense ?

  
  
Posted 2 years ago

According to pipeline_from_functions.py , it is easy to understand that step1 returns data_frame and I can use it as input of step2. But I have no idea what string reference could be used when steps come from Task?

code

pipe.add_function_step(
    name='step_one',
    function=step_one,
    function_kwargs=dict(pickle_data_url='${pipeline.url}'),
    function_return=['data_frame'],
    cache_executed_step=True,
)
pipe.add_function_step(
    name='step_two',
    # parents=['step_one'],  # the pipeline will automatically detect the dependencies based on the kwargs inputs
    function=step_two,
    function_kwargs=dict(data_frame='${step_one.data_frame}'),
    function_return=['processed_data'],
    cache_executed_step=True,
)

def step_one(pickle_data_url):
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object
    import sklearn  # noqa
    import pickle
    import pandas as pd
    from clearml import StorageManager
    pickle_data_url = \
        pickle_data_url or \
        '
'
    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, 'rb') as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris['data'], columns=iris['feature_names'])
    data_frame.columns += ['target']
    data_frame['target'] = iris['target']
    return data_frame


def step_two(data_frame, test_size=0.2, random_state=42):
    # make sure we have pandas for this step, we need it to use the data_frame
    import pandas as pd  # noqa
    from sklearn.model_selection import train_test_split
    y = data_frame['target']
    X = data_frame[(c for c in data_frame.columns if c != 'target')]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state)

    return X_train, X_test, y_train, y_test
  
  
Posted 2 years ago

But I have no idea what will be input of step2.

What do you mean by that? the assumption is that somehow the output of step 1 will be passed (a string reference) to step 2, what am I missing ?

  
  
Posted 2 years ago

can I mix steps with Task and Function?

Hmm interesting question, I think that in theory you should be able to, I have to admit that I have not tried yet, but it should work

  
  
Posted 2 years ago

Another question: can I mix steps with Task and Function?

  
  
Posted 2 years ago

None

  
  
Posted 2 years ago

I want to know the total fields(string reference mentioned above) which can be used as input of step2. Are these fields of ClearML Task?

  
  
Posted 2 years ago

Got it!

  
  
Posted 2 years ago

Thanks

  
  
Posted 2 years ago

Are these fields of ClearML Task?

correct

  
  
Posted 2 years ago