Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Heyo, After Building Some Custom Pipelining Functionality On Mlflow, I Started Looking For Better Software That Can Beat What I Created - With A Similar Amount Of Effort. Problem Has Been That Up Till Now, All I Found Could Make Things Way Better But Al

heyo,
after building some custom pipelining functionality on mlflow, I
started looking for better software that can beat what I created - with a
similar amount of effort. problem has been that up till now, all I
found could make things way better but also require way more effort.
thats why im here - clearml seems to meet my requirements well

  
  
Posted 2 years ago
Votes Newest

Answers 11


That makes sense to me, what do you think about the following:
` from clearml import PipelineDecorator

class AbstractPipeline(object):
def init():
pass

@PipelineDecorator.pipeline(...)
def run(self, run_arg):
data = self.step1(run_arg)
final_model = self.step2(data)
self.upload_model(final_model)

@PipelineDecorator.component(...)
def step1(self, arg_a):
# do something
return value

@PipelineDecorator.component(...)
def step2(self, arg_b):
# do something
return value This would mean steps 1/2 are executed on different machines, where the data passed between them is automatically serialized. It also allows you to build the actual logic in def run ` that drives the different components.

wdyt?

  
  
Posted 2 years ago

Thanks ContemplativePuppy11 !

How would you pass data/args between one step of the pipeline to another ?
Or are you saying the pipeline class itself stores all the components ?

  
  
Posted 2 years ago

looks promising. couple of questions:
wdym 'executed on different machines'? is there an mlclearish way of running a pipeline, ie something instead of implementing my own run method? i did my own run because i wanted to organize each pipeline into its own experiment folder and skip stages if they were already ran but it feels hacky and you folks have prolly a better way of doing this

  
  
Posted 2 years ago

wdym 'executed on different machines'?The assumption is that you have machines (i.e. clearml-agents) connected to clearml, which would be running all the different components of the pipeline. Think out of the box scale-up. Each component will become a standalone Job and the data will be passed (i.e. stored and loaded) automatically on the clearml-server (can be configured to be external object storage as well). This means if you have a step that needs GPU it will be launched on a GPU machine vs steps that are cpu/logic. Make sense ?

is there an mlclearish way of running a pipeline, ie something instead of implementing my own run method? i

What do you mean by "i did my own run because i wanted" ? Maybe a few clearml example s would help?
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
Does that help?

  
  
Posted 2 years ago

each child of Pipeline is a self contained pipeine, eg ModelPipeline. each step of the pipeline is a method, the order being set in the attribute array stage_handler_mapping . in the mlflow ui each stage, i.e. each methods results, is represented as a run within a fixed experiment

  
  
Posted 2 years ago

sure AgitatedDove14 . boiled down my pipeline into bare bones functionality and one file

  
  
Posted 2 years ago

I think my question is more about design, is a ModelPipeline class a self contained pipeline? (i.e. containing all the different steps or is it a single step in a pipeline)

  
  
Posted 2 years ago

ContemplativePuppy11

yes, nice move. my question was to make sure that the steps are not run in parallel because each one builds upon the previous one

if they are "calling" one another (or passing data) then the pipeline logic will deduce they cannot run in parallel 🙂 basically it is automatic

so my takeaway is that if the funcs are class methods the decorators wont break, right?

In theory, but the idea of the decorator is that it tracks the return value so it "knows" how to pass the data between the function (i.e. pass the reference to the data that is actually being stored as an artifact). This same mechanism allows it to know which function depends on which output of another function. This means that instantiating a class will actually be less efficient, and in practice might not work. does that make sense ?

  
  
Posted one year ago

This means if you have a step that needs GPU it will be launched on a GPU machine vs steps that are cpu/logic. Make sense ?

yes, nice move. my question was to make sure that the steps are not run in parallel because each one builds upon the previous one

Maybe a few clearml example s would help?

id checked out that file but now with your explanation it is clear to me how to do it. so my takeaway is that if the funcs are class methods the decorators wont break, right? i had had a problem once with another library and just wanted to be sure (i think it had to be with the whole class having to be serialized and not only the method)

  
  
Posted 2 years ago

AgitatedDove14 currently we use mlflow in some custom code to log and load artifacts

  
  
Posted 2 years ago

Hi ContemplativePuppy11
This is really interesting point.
Maybe you can provide a pseudo class abstract of your current pipeline design, this will help in trying to understand what you are trying to achieve and how to make it easier to get there

  
  
Posted 2 years ago
1K Views
11 Answers
2 years ago
one year ago
Tags