Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi, I Have A Question About The Pipeline, Especially About The Parallelism Part. We Are Considering Implementing A Use Case And Are Interested In Knowing Whether It Can Be Efficiently Managed Using Clearml Pipeline. Our Use Case Involves A Dataset That

Hi, I have a question about the Pipeline, especially about the parallelism part.

We are considering implementing a use case and are interested in knowing whether it can be efficiently managed using ClearML Pipeline.

Our use case involves a dataset that needs to be divided into 125 batches, and the environment is on the cloud. At the start of the pipeline, we intend to initiate a task on a CPU machine that will download data from a database and perform data preprocessing. Following this, the data will be split into 125 segments, each of which will be processed for inference on a separate GPU-enabled machine. Finally, we plan to use a machine with a larger memory capacity to aggregate all inference results and transfer the combined file to another machine dedicated to handling subsequent processing and push notifications.

Could you please advise if ClearML Pipeline can facilitate this workflow? If so, we would appreciate guidance on the setup and any best practices you might recommend. Thank you for your assistance.

Posted 3 months ago
Votes Newest


Hi Jason, yes this can be done. Your pipeline code will look like this:

Execution of preprocessing task

for i in range(125):
Execution of data splitting and inference task(s); each of the 125 tasks have the same base task name but different names, e.g. name = "inference_task" + str(i)
<end loop>

ids = ["${inference_task_" + str(i) + ".id}" for i in range(125)]
Execution of aggregation task with the ids passed in as some part of parameter_override e.g. "General/inference_ids": '[' + ','.join(ids) + ']', as a string that can be processed in the task script itself.

Let me know if you have any further questions; thanks!

Posted 3 months ago
1 Answer
3 months ago
3 months ago