Unanswered
Hello. I Am Looking For Some Input On How I Could Deal With A Triggerscheduler That Should Run All The Time As A Service.
I Run Clearml Inside Kubernetes (Server, Agents And Workers). The Problem I Want To Overcome Is Making Sure The Triggerscheduler Tas
Hello. I am looking for some input on how I could deal with a TriggerScheduler that should run all the time as a service.
I run ClearML inside kubernetes (server, agents and workers). The problem I want to overcome is making sure the TriggerScheduler task is able to self-heal so to say. At the moment if the k8s worker pod running the TriggerScheduler task dies for any reason, the task will show as "running" for a while until it gets aborted because of timeout. Since this task is supposed to run as a service I should not have to manually restart it every time something happens with the pod running it.
Is there any option to automatically restart the task if it fails/gets aborted?
1K Views
0
Answers
2 years ago
one year ago
Tags
Similar posts