I Have A Logical Task That I Want To Split To Multiple Workers. The Task Involves Processing Media Files (Not Training). The Optimal Design For Me Would Be:

“parent” task to “spawn” multiple child tasks Parent enqueue each child task with different paramteres (with the expectation they will run on different agents) Child tasks process their parts of the data Parent wait for all of them to finish Parent gathers their outputs (e.g. dataset IDs)
What’s the idiomatic way to achieve this?
Have the parent be a pipeline controller that dynamically generates tasks?
Automation job?
Something else?

Posted one year ago
Hi RoughTiger69
Interesting question, maybe something like:

` @PipelineDecorator.component(...)
def process_sub_list(things_to_do=[0,1,2]):
r = []
for i in things_to_do:
print("doing", i)
return r

def pipeline():

create some stuff to do:

results = []
for step in range(10):
r = process_sub_list(list(range(step*10, (step+1)*10)))

push into one list with all result, this will actually wait for them to be completed

merged = []
for r in results:
if bool(r):
print(max(merged)) `

Posted one year ago

AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?

Posted one year ago

AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?

Posted one year ago

Yes, exactly!

Posted one year ago

from what I gather there is a lightly documented concept

Yes ... 😞 the reason for it is that actually one could do:
` @PipelineDecorator.pipeline(...)
def pipeline(i):

if name == 'main':
pipeline(2) `Basically rerunning the pipeline 3 times
This support was added as some users found a use case for it, but I think this would be a rare one

Posted one year ago
