The version we're using is: 1.1.1-135 • 1.1.1 • 2.14
Yes, one interesting info would be: what dynamic storage provisioner are you using? (storageclass)
The Storage provisioner is: http://cephfs.csi.ceph.com
PleasantGiraffe85 is there any change in the PVC from your version to the current version?
I wonder what did we do to reach it, though... Could be we flooded it at some point.
Thank you both so much for the efforts to fix it 🙂
One of my colleagues ran once some training, with tons of data in the git folder which was not .gitignored - so I suspect it's related to this.
ya sure, I was referring to. create a new PVC just for the test
Hi SuccessfulKoala55 , thanks for assisting, yes we used the helm to install it. It isn't the latest version though. We installed it a month or two ago.
Also, which storage class are you using?
I’m just trying to understand of it’s something related ceph or clearml deployment
pretty weird; I have some issues with ceph in the past but never something like that
I'll continue reporting if it happens again
hmm... the volume is already attached - already used by clearml-fileserver ... so it fails on this
Hi PleasantGiraffe85 , did you use the helm chart from https://github.com/allegroai/clearml-helm-charts/ ?
did you tried to create a debug pod with a mount using ceph storageclass? you can start from here https://downey.io/notes/dev/ubuntu-sleep-pod-yaml/ then add the pvc and the mount. then you should exec into the pod and try to write a dummy file on the mount; I suspect the problem is there
and the storage class name (I hope that what you meant, SuccessfulKoala55 ) is ceph-c2-prod-rz01-cephfs
it’s pretty strange to me about the fact you can’t write on it
I have also no idea how it happened.
I managed to redeploy it and it seems to be accessible now