some suggestions:
start working just with clearml (no agent or serving, these ones will go in after clearml is working) try a fist deploy without any override if it works start adding values to override file (without reporting everything or it will be very difficult to debug, you should not report on override file what is not overridden) do helm upgrade check problems one by one
same , when i'm installing directly: sudo helm install clearmlai-1 allegroai-clearml/clearml -n clearml
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
chart-1669648334 clearml 1 2022-11-28 17:12:14.781754603 +0200 IST deployed clearml-4.3.0 1.7.0
I’m not sure why in your case liveness probe is trying to access a non localhost ip. What is the version of the chart you are trying to install? helm list -A
this is a clear issue with provisioner not handling the pvc request for any pod having a pvc. It’s not related chart but provisioner you are suing that probably doesn’t support dynamic allocation. what provisioner are you using?
currently http://kubernetes.io/no-provisioner
with that in place k8s should be able to provisione pvc
seems like i didn't define a persistant volume
this is basic k8s management that is not strictly related this chart. my suggestion is to have a default storageclass that will be able to provide the right pv/pvc for any deployment you are going to have on the cluster. I suggest to start from here: https://kubernetes.io/docs/concepts/storage/storage-classes/
i want the storage to be on NFS eventually, the NFS is mounted to a local path on all the nodes (/data/nfs-ssd)
can you also show output of kubectl get po
of the namespace where you installaed clearml?
you need to investigate why it’s still in Pending state
i know what storageclass is.. but i don't think that this is the problem i do have one standard, seems that pv claim do not collecting it
i set it as default the results are the same
can you pls put the entire helm list -A
output command?
Also k8s distribution and version you are using can be useful
this is the problem the elastic pod shows:
Events:
Type Reason Age From Message
Warning FailedScheduling 65s (x2 over 67s) default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.