Does Clearml Have A Good Story For Offline/Batch Inference In Production? I Worked In The Airflow World For 2 Years And These Are The General Features We Used To Accomplish This. Are These Possible With Clearml?

Answered

Does clearML have a good story for offline/batch inference in production? I worked in the Airflow world for 2 years and these are the general features we used to accomplish this. Are these possible with ClearML?

Triggering: We'd want to be able to trigger a batch inference:

(rarely) on a schedule
(often) via a trigger in an event-based system, like maybe from AWS lambda function
Parameters: We'd want to be able to pass parameters when we trigger the job, such as a start_date and end_date that the batch job can use to query the feature store to get the data to run inference on.

Retries/Alerts: retry a failed job a few times. Alert if fails all times.

Metrics/Alerts:

track how long each task and pipeline runs.
alert if a pipeline was supposed to run, but never did
Backfilling : every day, we might inference on a day's worth of data. But for new pipelines, we'll need to run inference jobs on all of the data that existed before we created this pipeline.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BattyCrocodile47
				
					0
					 × 1

Votes Newest

Answers 2

Hi @<1541954607595393024:profile|BattyCrocodile47>

Does clearML have a good story for offline/batch inference in production?

Not sure I follow, you mean like a case study ?

Triggering:

We'd want to be able to trigger a batch inference:

(rarely) on a schedule
(often) via a trigger in an event-based system, like maybe from AWS lambda function(2) Yes there is a great API for that, checkout the github actions it is essentially the same idea (RestAPI also available) None

Parameters:

We'd want to be able to pass parameters when we trigger the job, such as a

start_date

and

end_date

that the batch job can use to query the feature store to get the data to run inference on.

Also available, see the manual remote execution example here: None

Retries/Alerts:

retry a failed job a few times. Alert if fails all times.

Of course 🙂 I would check the Slack alert as a good reference for that: None

Metrics/Alerts:

track how long each task and pipeline runs.
alert if a pipeline was supposed to run, but never didSure thing, I would check the cleanup service as it queries Tasks, can pull execution time and other metrics. Notice that at the end pipelines are also Tasks (of a certain type), so the same way you query a Task one would query a pipeline: None

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is totally what I was looking for! Yeah, by "good story for offline batch" I meant, "good feature support for ..."

I bookmarked this comment. I think I'll be doing a POC trying to show this functionality within the next month.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BattyCrocodile47
				
					0
					 × 1

Write your answer

2K Views

2 Answers

2 years ago