Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Am Struggling A Bit To Understand The Use Case Of A Pipeline: Let Say You Have Step1 -> Step2 -> Step3 What Is The Point To Use Pipeline Feature Versus Having A Single Task That Do Those Steps One After Another ???

I am struggling a bit to understand the use case of a pipeline:
Let say you have step1 -> step2 -> step3
What is the point to use pipeline feature versus having a single task that do those steps one after another ???

  
  
Posted one year ago
Votes Newest

Answers 9


Caching can be a reason. Say you do some heavy data loading / processing in step 1. Now you're developing step 2.

It'd be nice not to have to re-run Step 1 every time you want to test a change to step 2.

You could find a way to simply write your output of step1 to disk and do everything in one step, or you could let ClearML handle that caching for you--with the added benefit that others collaborating remotely can also use the outputs of steps you've cached with ClearML

  
  
Posted one year ago

Oh there's parallelization as well. You could have step 1 gather the data, and then fan out to N parallel steps that all do different things with the data, for example hyper parameter tuning

  
  
Posted one year ago

About the caching: how does it work ? ClearML maintain it own cache and monitor if any of you code changes? Even code that get change inside an import ?

  
  
Posted one year ago

@<1576381444509405184:profile|ManiacalLizard2> , the rules for caching steps is as follows - First you need to enable it. Then assuming that there is no change of input from the previous time run AND there is no code change THEN use output from previous pipeline run. Code from imports shouldn't change since requirements are logged from previous runs and used in subsequent runs

  
  
Posted one year ago

I mean, what happen if I import and use function from another py file ? And that function code changes ?
Or you are expecting code should be frozen and only parameters changes between runs ?

  
  
Posted one year ago

If there is a change in code (Not just the script itself but a different commit / different uncommitted changes in the repo). Makes sense?

  
  
Posted one year ago

ok, so if git commit or uncommit changes differ from previous run, then the cache is "invalidated" and the step will be run again ?

  
  
Posted one year ago

Yep

  
  
Posted one year ago

Clear. Thanks @<1523701070390366208:profile|CostlyOstrich36> !

  
  
Posted one year ago
1K Views
9 Answers
one year ago
one year ago
Tags