Is It Possible To Schedule Pipelines On Events Like Dataset Update?

Answered

Is it possible to schedule pipelines on events like dataset update?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Votes Newest

Answers 17

Trying to understand these, maybe playing around will help

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

dataset1 -> process -> dataset2

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Essentially, if I have a dataset on which I am performing transformations and then creating other downstream datasets

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Basically the idea is that you create the pipeline once (say debug), then once you see it is running, you have a Task of your pipeline in the system (with any custom logic you added). With a Task in the system you can always clone/modify and launch externally (i.e. from code/ui. Make sense ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Not able to understand what’s really happening in the links

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Got it, thanks

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

PipelineController creates another Task in the system, that you can later clone and enqueue to start a process (usually queuing it on the "services" queue)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ohh, then yes, you can use the https://github.com/allegroai/clearml/blob/bd110aed5e902efbc03fd4f0e576e40c860e0fb2/clearml/automation/monitor.py#L10 class to monitor changes in the dataset/project

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

How can i make it such that any update to the upstream database

What do you mean "upstream database"?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks let me try playing with these!

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

AgitatedDove14 - where does automation.controller.PipelineController fit in?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

AgitatedDove14 - thanks for the quick reply. automation.Monitor is the abstraction i could use?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

sorry mean upstream dataset

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

now if dataset1 is updated, i want process to update dataset2

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

My question is - I have this in a notebook now. How can i make it such that any update to the upstream database triggers this data transformation step

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

It's a good abstraction for monitoring the state of the platform and call backs, if this is what you are after.
If you just need "simple" cron, then you can always just loop/sleep 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi TrickySheep9
So basically the idea is you can quickly code a scheduler with your own logic, then launch is on the "services queue" to run basically forever 🙂
This could be a good example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py

https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

17 Answers

4 years ago

2 years ago