I suppose one way to perform this is with a https://clear.ml/docs/latest/docs/references/sdk/scheduler that kicks off a health check task (check exit state of executed tasks). It seems more efficient to support a triggered response to task fail.
Hi PanickyMoth78
You mean like another Task? or maybe Slack message?
I suppose one way to perform this is with a
that kicks
Yes, that was my thinking.
It seems more efficient to support a triggered response to task fail.
Not sure I follow this one, I mean the pipeline logic itself monitors the execution. If I'm not mistaken, try/except will catch a step that files, and a global will catch the entire pipeline. Am I missing something ?
Yes.
Some mechanism that would allow for followup code execution. Ideally in a way that would not be susceptible to the same things that may cause a task to fail.
There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
Yes that makes sense, especially from IT failure perspective