Unanswered
Hello. I Have An Issue In Regards To A Task That I Run As A Service ( Should Always Run). I Run The Clearml Server And Agents In Kubernetes.
I Think This Is A Design Problem With The Way Clearml Agents Run On Kubernetes. The K8S Glue Will Launch A Worker
From k8s perspective a pod is ephemeral so if it’s gone for any reason it’s gone. Obviously there are structures that can ensure running state (like Deployments or Statefulsets) so if a pod dies, another one takes place. We didn;t go in this direction because pods are not idempotent so it’s not straightfoward to simply replace them. Btw this looks an interesting topic to me so I’d like to include SuccessfulKoala55 on this also because i’m involved more in infra side of the equation and I may miss something here.
197 Views
0
Answers
2 years ago
one year ago