Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have A Logical Task That I Want To Split To Multiple Workers. The Task Involves Processing Media Files (Not Training). The Optimal Design For Me Would Be:

I have a logical task that I want to split to multiple workers. The task involves processing media files (not training).
The optimal design for me would be:
“parent” task to “spawn” multiple child tasks Parent enqueue each child task with different paramteres (with the expectation they will run on different agents) Child tasks process their parts of the data Parent wait for all of them to finish Parent gathers their outputs (e.g. dataset IDs)
What’s the idiomatic way to achieve this?
Have the parent be a pipeline controller that dynamically generates tasks?
Automation job?
Something else?

  
  
Posted 2 years ago
Votes Newest

Answers 5


AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?

  
  
Posted 2 years ago

Hi RoughTiger69
Interesting question, maybe something like:

` @PipelineDecorator.component(...)
def process_sub_list(things_to_do=[0,1,2]):
r = []
for i in things_to_do:
print("doing", i)
r.append("done{}".format(i))
return r

@PipelineDecorator.pipeline(...)
def pipeline():

create some stuff to do:

results = []
for step in range(10):
r = process_sub_list(list(range(step*10, (step+1)*10)))
results.append(r)

push into one list with all result, this will actually wait for them to be completed

merged = []
for r in results:
if bool(r):
merged.extend(list(r))
print(max(merged)) `
wdyt?

  
  
Posted 2 years ago

AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?

  
  
Posted 2 years ago

from what I gather there is a lightly documented concept

Yes ... 😞 the reason for it is that actually one could do:
` @PipelineDecorator.pipeline(...)
def pipeline(i):
....

if name == 'main':
pipeline(0)
pipeline(1)
pipeline(2) `Basically rerunning the pipeline 3 times
This support was added as some users found a use case for it, but I think this would be a rare one

  
  
Posted 2 years ago

Yes, exactly!

  
  
Posted 2 years ago
950 Views
5 Answers
2 years ago
one year ago
Tags