Hi everyone, I have deployed the ClearML Server with helm on AWS EKS. In the development environment the clearml node is deployed on a spot instance and I noticed that when the node is recreated all the experiments and credentials are lost. I was wondering what is the best way to avoid this loss of data ( an option inside clearml, use external db..) when the clearml node is recreated for some reasons. Thank you!

Posted 2 years ago
Hi LovelyHamster1 ,
From your question I assume you're using the old single-node helm chart, which is not suitable for EKS with preemptable nodes (it does not use PVs and requires all pods to be run in the same node, on which all data is stored).
It seems like you need the new Helm chart - see https://github.com/allegroai/clearml-helm-charts .

Posted 2 years ago
2 years ago
10 months ago