did you tried to create a debug pod with a mount using ceph storageclass? you can start from here https://downey.io/notes/dev/ubuntu-sleep-pod-yaml/ then add the pvc and the mount. then you should exec into the pod and try to write a dummy file on the mount; I suspect the problem is there
The Storage provisioner is: http://cephfs.csi.ceph.com
Hi SuccessfulKoala55 , thanks for assisting, yes we used the helm to install it. It isn't the latest version though. We installed it a month or two ago.
Hi PleasantGiraffe85 , did you use the helm chart from https://github.com/allegroai/clearml-helm-charts/ ?
and the storage class name (I hope that what you meant, SuccessfulKoala55 ) is
Yes, one interesting info would be: what dynamic storage provisioner are you using? (storageclass)
The version we're using is: 1.1.1-135 • 1.1.1 • 2.14
PleasantGiraffe85 is there any change in the PVC from your version to the current version?
Also, which storage class are you using?
JuicyFox94 do you have any idea?
That's very strange....
hmm... the volume is already attached - already used by clearml-fileserver ... so it fails on this
ya sure, I was referring to. create a new PVC just for the test
(Multi-Attach error for volume)
this is interesting
I’m just trying to understand of it’s something related ceph or clearml deployment
it’s pretty strange to me about the fact you can’t write on it
I have also no idea how it happened.
I managed to redeploy it and it seems to be accessible now
I wonder what did we do to reach it, though... Could be we flooded it at some point.
I'll continue reporting if it happens again
pretty weird; I have some issues with ceph in the past but never something like that
good it’s solved 😄
Thank you both so much for the efforts to fix it 🙂
One of my colleagues ran once some training, with tons of data in the git folder which was not .gitignored - so I suspect it's related to this.