Answered

@<1523701205467926528:profile|AgitatedDove14> @<1523701087100473344:profile|SuccessfulKoala55> Hi, I have a question when using pipeline. I use pipeline with steps from Tasks. But I have no idea what will be input of step2. Here is an example from document.

pipe.add_step(
    name="stage_data",
    base_task_project="examples",
    base_task_name="Pipeline step 1 dataset artifact",
    parameter_override={"General/dataset_url": "${pipeline.url}"},
)

pipe.add_step(
    name="stage_process",
    parents=["stage_data"],
    base_task_project="examples",
    base_task_name="Pipeline step 2 process dataset",
    parameter_override={
        "General/dataset_url": "${stage_data.artifacts.dataset.url}",
        "General/test_size": 0.25,
    },
    pre_execute_callback=pre_execute_callback_example,
    post_execute_callback=post_execute_callback_example,
)

The example overrides the General/dataset_url parameter with the output of step1. So my question is what the output of the step1? A ClearML Task?
If it were a ClearML Task, does it mean that I can use any value in ClearML Task as an input to step2?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FloppyDeer99
				
					0
					 × 1

Votes Newest

Answers 10

None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

According to pipeline_from_functions.py , it is easy to understand that step1 returns data_frame and I can use it as input of step2. But I have no idea what string reference could be used when steps come from Task?

code

pipe.add_function_step(
    name='step_one',
    function=step_one,
    function_kwargs=dict(pickle_data_url='${pipeline.url}'),
    function_return=['data_frame'],
    cache_executed_step=True,
)
pipe.add_function_step(
    name='step_two',
    # parents=['step_one'],  # the pipeline will automatically detect the dependencies based on the kwargs inputs
    function=step_two,
    function_kwargs=dict(data_frame='${step_one.data_frame}'),
    function_return=['processed_data'],
    cache_executed_step=True,
)

def step_one(pickle_data_url):
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object
    import sklearn  # noqa
    import pickle
    import pandas as pd
    from clearml import StorageManager
    pickle_data_url = \
        pickle_data_url or \
        '

'
    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, 'rb') as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris['data'], columns=iris['feature_names'])
    data_frame.columns += ['target']
    data_frame['target'] = iris['target']
    return data_frame


def step_two(data_frame, test_size=0.2, random_state=42):
    # make sure we have pandas for this step, we need it to use the data_frame
    import pandas as pd  # noqa
    from sklearn.model_selection import train_test_split
    y = data_frame['target']
    X = data_frame[(c for c in data_frame.columns if c != 'target')]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state)

    return X_train, X_test, y_train, y_test

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FloppyDeer99
				
					0
					 × 1

I have no idea what string reference could be used when steps come from Task?

Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the other Task. The assumption is the Task itself was designed with in/outs in the first place. Does that make sense ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

But I have no idea what will be input of step2.

What do you mean by that? the assumption is that somehow the output of step 1 will be passed (a string reference) to step 2, what am I missing ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

can I mix steps with Task and Function?

Hmm interesting question, I think that in theory you should be able to, I have to admit that I have not tried yet, but it should work

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I want to know the total fields(string reference mentioned above) which can be used as input of step2. Are these fields of ClearML Task?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FloppyDeer99
				
					0
					 × 1

Are these fields of ClearML Task?

correct

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FloppyDeer99
				
					0
					 × 1

Got it!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FloppyDeer99
				
					0
					 × 1

Another question: can I mix steps with Task and Function?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FloppyDeer99
				
					0
					 × 1

Write your answer

172 Views

10 Answers

one year ago

3 months ago