Unanswered
Hi, I Have A Pipeline With Steps Currently Running On-Prem. I Want To Use Autoscaler With Spot Instances To Replace The On-Prem Machine. My Question Regards Identifying A Task Failure Due To Instance Being Terminated Mid-Task. Is There A Way To Differenti
Hi @<1639799308809146368:profile|TritePigeon86> , if a task (and its agent) are terminated mid-run, there's no way for the system to know that, only by enforcing a timeout on tasks that have not reported for a given period of time. The ClearML server does have this functionality, and tasks that have not reported for a predefined period of time (default is 2 hours) will be marked as aborted (with the non-responsive status in the task status message)
97 Views
0
Answers
8 months ago
8 months ago