Hello. I Am Looking For Some Input On How I Could Deal With A Triggerscheduler That Should Run All The Time As A Service. I Run Clearml Inside Kubernetes (Server, Agents And Workers). The Problem I Want To Overcome Is Making Sure The Triggerscheduler Tas
Hello. I am looking for some input on how I could deal with a TriggerScheduler that should run all the time as a service.
I run ClearML inside kubernetes (server, agents and workers). The problem I want to overcome is making sure the TriggerScheduler task is able to self-heal so to say. At the moment if the k8s worker pod running the TriggerScheduler task dies for any reason, the task will show as "running" for a while until it gets aborted because of timeout. Since this task is supposed to run as a service I should not have to manually restart it every time something happens with the pod running it.
Is there any option to automatically restart the task if it fails/gets aborted?
Posted 5 months ago
5 months ago
4 months ago