Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?


AgitatedDove14

That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?

sub-processes is an option but it supports much more: https://kedro.readthedocs.io/en/stable/10_deployment/01_deployment_guide.html one can containerise the whole pipeline and run it pretty much anywhere. So I don't think the view of single instance is up-to-date

This actually ties well with the next version of pipelines we are working on. Basically like kubeflow add a decorator to a function making the fucntion a step in the pipeline (and a Task in ClearML).
My thinking was somehow separate short/simple steps (i.e. functions), from complicated steps (e.g. training with specific requirements).
Maybe Kedro can launch the "simple steps"? what do you think?

I might be misunderstanding things. My thinking was I can use one command and run all steps locally while still registering all "nodes/functions/inputs/outputs etc" with clearml such that I could also then later go into the interface and clone any of the individual steps and run them again. Completely independent of simple or hard steps. With another command I could also just pseudo run the pipeline with kedro locally to register everything in clearml and then run it on a clearml agent. I thought that in both cases I would need to create a PipelineController Task at the end with the full pipeline included I could even just clone that one. The latter is not working yet while the former (individual tasks) is already working except some python environment issues.

The other challenge I have come across is that using Task.init really just works if it is run in the script file itself right. If I want to use a hook system (e.g. kedro provides hooks for running callbacks before and after nodes/tasks) I can create new tasks but as the "Task.init()" is not technically run in the script that contains the source code the tracking is really challenging. Is there a way to use Task as a decorator on a function level?

All that said I might be going too deep in how I want to integrate the two frameworks in ways that is beyond the scope....

  
  
Posted 3 years ago
156 Views
0 Answers
3 years ago
one year ago