Hello. I Have An Issue In Regards To A Task That I Run As A Service ( Should Always Run). I Run The Clearml Server And Agents In Kubernetes.
I Think This Is A Design Problem With The Way Clearml Agents Run On Kubernetes. The K8S Glue Will Launch A Worker
With pipelines is even more complicated because what I experienced is that the pod for step 2 was evicted because it was eating too much memory. So the pod has been terminated but the task was not marked as failed / aborted. Because of that, the pipeline controller pod was still running and the pipeline itself was also not marked as aborted / failed.
2 years ago
one year ago