Hello. I Am Looking For Some Input On How I Could Deal With A Triggerscheduler That Should Run All The Time As A Service. I Run Clearml Inside Kubernetes (Server, Agents And Workers). The Problem I Want To Overcome Is Making Sure The Triggerscheduler Tas

Unanswered

Hello. I am looking for some input on how I could deal with a TriggerScheduler that should run all the time as a service.

I run ClearML inside kubernetes (server, agents and workers). The problem I want to overcome is making sure the TriggerScheduler task is able to self-heal so to say. At the moment if the k8s worker pod running the TriggerScheduler task dies for any reason, the task will show as "running" for a while until it gets aborted because of timeout. Since this task is supposed to run as a service I should not have to manually restart it every time something happens with the pod running it.

Is there any option to automatically restart the task if it fails/gets aborted?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DangerousDragonfly8
				
					0
					 × 1

Write your answer

3K Views

0 Answers

2 years ago