Hi @<1541954607595393024:profile|BattyCrocodile47>
Does clearML have a good story for offline/batch inference in production?
Not sure I follow, you mean like a case study ?
Triggering:
We'd want to be able to trigger a batch inference:
- (rarely) on a schedule
- (often) via a trigger in an event-based system, like maybe from AWS lambda function(2) Yes there is a great API for that, checkout the github actions it is essentially the same idea (RestAPI also available) None
Parameters:
We'd want to be able to pass parameters when we trigger the job, such as a
start_date
and
end_date
that the batch job can use to query the feature store to get the data to run inference on.
Also available, see the manual remote execution example here: None
Retries/Alerts:
retry a failed job a few times. Alert if fails all times.
Of course 🙂 I would check the Slack alert as a good reference for that: None
Metrics/Alerts:
- track how long each task and pipeline runs.
- alert if a pipeline was supposed to run, but never didSure thing, I would check the cleanup service as it queries Tasks, can pull execution time and other metrics. Notice that at the end pipelines are also Tasks (of a certain type), so the same way you query a Task one would query a pipeline: None
This is totally what I was looking for! Yeah, by "good story for offline batch" I meant, "good feature support for ..."
I bookmarked this comment. I think I'll be doing a POC trying to show this functionality within the next month.