Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi! I Noticed A Bug Related To Reusing The Same Component In A Pipeline. I Have Prepared A Mock Example So That You Can Reproduce It:

Hi!
I noticed a bug related to reusing the same component in a pipeline. I have prepared a mock example so that you can reproduce it:
` from clearml.automation.controller import PipelineDecorator

@PipelineDecorator.component(
return_values="mock_result", execution_queue="mock_components"
)
def mock_step(num_1, num_2=None):
if num_2 is None:
print("Number 1 squared")
result = num_1 ** 2
else:
print("Number 1 multiplied by Number 2")
result = num_1 * num_2
return result

@PipelineDecorator.pipeline(
name="Reusing component in the pipeline",
project="Mocks",
version="0.1",
pipeline_execution_queue="mock_pipelines",
)
def mock_pipeline():
NUMBER_1, NUMBER_2 = 3, 5
first_result = mock_step(num_1=NUMBER_1, num_2=NUMBER_2)
print("First result has been computed:", first_result)
last_result = mock_step(num_1=NUMBER_1)
print("Last result has been computed:", last_result)

if name == "main":
PipelineDecorator.debug_pipeline(execute_steps_as_functions=False)
mock_pipeline() `Even though the second time the parameter 'num_2' is not specified, it is somehow stored in the step and the execution falls again in the else statement in 'mock_step'.
BTW, I'm using clearml 1.1.3rc0 and clearml-agent 1.1.0

  
  
Posted 3 years ago
Votes Newest

Answers 16


Thanks GiganticTurtle0
So the bug is "mock_step" is storing "NUMBER_2" argument value in the second instance?

  
  
Posted 3 years ago

Oh right, I missed the fact the helper functions are also decorated, yes it makes sense we add the tags as well.
Regarding nested pipelines, I think my main question is , are they independent or are we generating everything from the same code base?

  
  
Posted 3 years ago

... these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?

Helper functions are not "component", they are actually files that will be accessible when running the component itself.
am I missing something ?

  
  
Posted 3 years ago

Can you think of any other way to launch multiple pipelines concurrently? Since we have already seen it is only possible to run a single Pipelinecontroller in a single Python process

  
  
Posted 3 years ago

Building the pipeline in runtime from external configuration is very cool!!
I think nested components is exactly the correct solution, and it is a great use case.

  
  
Posted 3 years ago

Since I am still on time, I would like to report another minor bug related to the 'add_pipeline_tags' parameter of PipelineDecorator.pipeline . It turns out when the pipeline consists of components that in turn use other components (via 'helper_functions'), these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?

  
  
Posted 3 years ago

Nested pipelines do not depend on each other. You can think of it as several models being trained or doing inference at the same time, but each one delivering results for a different client. So you don't use the output from one nested pipeline to feed another one running concurrently, if that's what you mean.

  
  
Posted 3 years ago

To sum up, we agree that it will be nice to enable the nested components tags. I will continue playing with the capabilities of nested components and keep reporting bugs as I come across them!

  
  
Posted 3 years ago

The thing is I don't know in advance how many models there will be in the inference stage. My approach is to read from a database the configurations of the operational models through a for loop, and in that loop all the inference tasks would be enqueued (one task for each deployed model). For this I need the system to be able to run several pipelines at the same time. As you told me for now this is not possible, as pipelines are based on singletons, my alternative is to use components

  
  
Posted 3 years ago

GiganticTurtle0 your timing is great, the plan is to wrap-up efforts and release early next week (I'm assuming GitHub fixes will be pushed tomorrow I'll post here once they are there)

  
  
Posted 3 years ago

They share the same code (i.e. the same decorated functions), but using a different configuration.

  
  
Posted 3 years ago

Well, instead of plain functions or files I use components because I need some of those steps to run on one machine and some on another. And it works perfectly fine (ignoring some minor bugs like this one). So I'm actually inserting component-decorated functions into 'helper_functions' parameter

  
  
Posted 3 years ago

Yes 🙂

  
  
Posted 3 years ago

Hmm I think the approach in general would be to create two pipeline tasks, then launch them from a third pipeline or trigger externally? If on the other hand it makes sense to see both pipelines on the same execution graph, then the nested components makes a lot of sense. Wdyt?

  
  
Posted 3 years ago

BTW I would really appreciate it if you let me know when you get it fixed 🙏

  
  
Posted 3 years ago

Pushed 🙂

  
  
Posted 3 years ago
1K Views
16 Answers
3 years ago
one year ago
Tags