Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?


Hi JealousParrot68

clearml tracking of experiments run through kedro (similar to tracking with mlflow)

That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?

I think the challenge here is to pick the right abstraction matching. E.g. should a node in kedro (which usually is one function but can also be more involved) be equivalent to a task or should a pipeline be a task?

This actually ties well with the next version of pipelines we are working on. Basically like kubeflow add a decorator to a function making the fucntion a step in the pipeline (and a Task in ClearML).
My thinking was somehow separate short/simple steps (i.e. functions), from complicated steps (e.g. training with specific requirements).
Maybe Kedro can launch the "simple steps"? what do you think?

I am writing a small plugin for kedro/clearml atm that tries to link up kedro with clearml. Would be interesting to share experience and get input from the clearml people at some point.

YES! please share that sounds great!

Also is it good practice to reuse task_ids when running the same job twice during debugging or always create a new one.

Hmm good point, this is why you can configure the behavior in clearml.conf (or disable it altogether) , currently we assume that if not artifacts/models were used and the last time you executed the Task was under 72h ago, the Task ID will be used (assuming running from the same machine)

  
  
Posted 3 years ago
147 Views
0 Answers
3 years ago
one year ago