There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
Yes that makes sense, especially from IT failure perspective
I suppose one way to perform this is with a
that kicks
Yes, that was my thinking.
It seems more efficient to support a triggered response to task fail.
Not sure I follow this one, I mean the pipeline logic itself monitors the execution. If I'm not mistaken, try/except will catch a step that files, and a global will catch the entire pipeline. Am I missing something ?
I suppose one way to perform this is with a https://clear.ml/docs/latest/docs/references/sdk/scheduler that kicks off a health check task (check exit state of executed tasks). It seems more efficient to support a triggered response to task fail.
Hi PanickyMoth78
You mean like another Task? or maybe Slack message?
Yes.
Some mechanism that would allow for followup code execution. Ideally in a way that would not be susceptible to the same things that may cause a task to fail.
There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)