Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Is It Possible To Schedule Pipelines On Events Like Dataset Update?

Is it possible to schedule pipelines on events like dataset update?

  
  
Posted 3 years ago
Votes Newest

Answers 17


How can i make it such that any update to the upstream database

What do you mean "upstream database"?

  
  
Posted 3 years ago

Hi TrickySheep9
So basically the idea is you can quickly code a scheduler with your own logic, then launch is on the "services queue" to run basically forever ๐Ÿ™‚
This could be a good example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py

https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py

  
  
Posted 3 years ago

Essentially, if I have a dataset on which I am performing transformations and then creating other downstream datasets

  
  
Posted 3 years ago

PipelineController creates another Task in the system, that you can later clone and enqueue to start a process (usually queuing it on the "services" queue)

  
  
Posted 3 years ago

Got it, thanks

  
  
Posted 3 years ago

AgitatedDove14 - thanks for the quick reply. automation.Monitor is the abstraction i could use?

  
  
Posted 3 years ago

now if dataset1 is updated, i want process to update dataset2

  
  
Posted 3 years ago

It's a good abstraction for monitoring the state of the platform and call backs, if this is what you are after.
If you just need "simple" cron, then you can always just loop/sleep ๐Ÿ™‚

  
  
Posted 3 years ago

Not able to understand whatโ€™s really happening in the links

  
  
Posted 3 years ago

sorry mean upstream dataset

  
  
Posted 3 years ago

Trying to understand these, maybe playing around will help

  
  
Posted 3 years ago

Ohh, then yes, you can use the https://github.com/allegroai/clearml/blob/bd110aed5e902efbc03fd4f0e576e40c860e0fb2/clearml/automation/monitor.py#L10 class to monitor changes in the dataset/project

  
  
Posted 3 years ago

My question is - I have this in a notebook now. How can i make it such that any update to the upstream database triggers this data transformation step

  
  
Posted 3 years ago

dataset1 -> process -> dataset2

  
  
Posted 3 years ago

AgitatedDove14 - where does automation.controller.PipelineController fit in?

  
  
Posted 3 years ago

Basically the idea is that you create the pipeline once (say debug), then once you see it is running, you have a Task of your pipeline in the system (with any custom logic you added). With a Task in the system you can always clone/modify and launch externally (i.e. from code/ui. Make sense ?

  
  
Posted 3 years ago

Thanks let me try playing with these!

  
  
Posted 3 years ago