Answered

Hi, I Am Creating Pipeline From Function With Dynamically Created Steps, Eg. If I Pass Pipeline Param Tune_Optime='Recall,Precision', My Pipeline Is Creating 2 Tasks/Steps - Each For Trained Model. Everything Is Working Really Nice, When I Start Pipeline

Hi,
I am creating pipeline from function with dynamically created steps, eg. if I pass pipeline param tune_optime='recall,precision', my pipeline is creating 2 tasks/steps - each for trained model. Everything is working really nice, when I start pipeline from script. Pipeline is loaded and send to agents. In GUI also looks fine.
The problem is, when I start "NEW RUN" from GUI and change value of my key parameters.
When I add new value to tune_optimize, I see that it was loaded by pipeline (printing pipe._task.get_parameters_as_dict()['Args']), but new step was not created.
When I reduce tune_optime value to just 'recall'. Pipeline execution failed with msg:
ValueError: Node 'tune_et_for_Precision', base_task_id is empty .
Should it work like that by design?
Why it is working fine when I execute pipeline from script, but not when started in GUI?

My code:
` from clearml import PipelineController

def step_preprocessing():
print('preprocessing time!')
return

def step_training():
print('training time!')
return

def step_tune(model_task_id):
print('tune time!')
return

if name == 'main':

pipe = PipelineController(
    name='TestFunction',
    project='Test',
    version='1.0.0',
    add_pipeline_tags=False,
)

# Dataset parameters
pipe.add_parameter(name='models_list', description='model_list', default='et,dt')
pipe.add_parameter(name='tune_optimize', description='optimize', default="Recall,Precision")

print('pipe._task.get_parameters_as_dict():')
print(pipe._task.get_parameters_as_dict())

print('pipe._pipeline_args:')
print(pipe._pipeline_args)

# During pipeline initialisation pipeline_params is empty and we need to use default values.
# When pipeline start the run, params are lunched again, and then pipeline_params can be used.
try:
    pipeline_params = pipe._task.get_parameters_as_dict()['Args']
except KeyError:
    pipeline_params = pipe._pipeline_args

print(f'Pipeline params:\n{pipeline_params}')
pipe.add_function_step(
    name='preprocessing',
    function=step_preprocessing,
    function_return=['data_frame'],
    cache_executed_step=True,
)

models_name = pipeline_params['models_list'].replace(" ", "").split(',')
optimizers = pipeline_params['tune_optimize'].replace(" ", "").split(',')

for model_name in models_name:
    tune_function_kwargs = {}
    pipe.add_function_step(
        name=f'{model_name}_training',
        project_name='Test/Research',
        task_name=f"train_{model_name}",
        task_type='training',
        parents=['preprocessing'],
        function=step_training,
        function_return=['out'],
        cache_executed_step=False,
    )
    tune_function_kwargs["model_task_id"] = f'${{{model_name}_training.id}}'
    counter=1
    for item in optimizers:
        pipe.add_function_step(
            name=f'tune_{model_name}_for_{item}',
            project_name='Test/Research',
            task_name=f"tune_{model_name}_{item}",
            task_type='optimizer',
            function=step_tune,
            function_kwargs=tune_function_kwargs,
            function_return=['out'],
            cache_executed_step=False,
        )
        counter = counter + 1

pipe.start(queue='TestQUEUE')
print('process completed') `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					HandsomeGiraffe70
				
					0
					 × 1

Votes Newest

Answers 3

Hi HandsomeGiraffe70

First:
# During pipeline initialisation pipeline_params is empty and we need to use default values. # When pipeline start the run, params are lunched again, and then pipeline_params can be used.Hmm that should probably be fixed, maybe a function on the pipeline to deal with it ?

When I reduce tune_optime value to just 'recall'. Pipeline execution failed with msg:

ValueError: Node 'tune_et_for_Precision', base_task_id is empty

.

I would imagine that it is failing to find the requested Task?
specifically:
project_name='Test/Research', task_name=f"tune_{model_name}_{item}",obviously we should improve the error , but first could that be the case ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 ,
Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly 🙂
Ad2. I am not sure, I ask a question then:
When I run pipeline from script, new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?
Your question may get positive answer then and I can imagine explanation for that. I am just thinking if it working like that by design or not.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					HandsomeGiraffe70
				
					0
					 × 1

Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly

Good point, let;s fix it 🙂

new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?

Oh I think I understand what happens, the way the pipeline logic is built, is that the "DAG" is created the first time the code runs, then when you re-run the pipeline step it serializes the DAG from the Task/backend.
The initial thinking is that, well we want to sometime in the future allow you to easily edit the DAG in the UI, hence the behavior.
But, specifically here, we want the opposite.
As a temp hack you can add the following:
` print(pipe._task.get_parameters_as_dict())

clear the stored DAG

pipe._task.set_configuration_object(name=pipe._config_section, config_text="") `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

3 Answers

3 years ago

2 years ago