Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hey, I’M Thinking Of Using A Clearml Pipeline To Compile A Dataset More Efficiently. My Hope Is That I Won’T Have To Run

Hey, I’m thinking of using a ClearML Pipeline to compile a dataset more efficiently.

My hope is that I won’t have to run every step for every data point every time, as the dataset is big and some of the steps are intensive etc.
I am at a stage where I will be switching out models and algorithms rapidly to try and find the best combinations, and adding / removing Tasks (e.g. to create new Features), so it’s important to me that the process of compiling the dataset is as quick and traceable as possible.

How would I set up a ClearML Pipeline/Tasks (Pipeline components) such that:
If the Task has been run before with the same code & model & input data, the Task is not run again and instead cached outputs (e.g. features) are passed onto the next Task(s) in the Pipeline If code or model for a Task has been updated, all input data are processed (with the results being passed on to downstream Task(s)) If code or model for a Task has not changed but some input data has changed, only run the Task on the new input data, then combine the newly processed outputs with the (correct) previously-computed+cached outputs If new Tasks are added to the Pipeline, (e.g. adding the requisite Tasks to create a new Feature in the final CSV), the old Tasks should still function as in 1, 2 and 3
Is there a good way to do this?

Posted one year ago
Votes Newest


0 Answers
one year ago
one year ago
Similar posts