Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Guys, I'M Having The Strangest Error Ever... I'Ve Installed Clearml Using Helm On Our K8S Server, And Now The Fileserver Complains That It Has No Permission To Access The Mounted Volume. I Have In The Yaml The Config:

Hey guys,
I'm having the strangest error ever... I've installed ClearML using HELM on our K8S server, and now the fileserver complains that it has no permission to access the mounted volume.
I have in the YAML the config:
volumeMounts: - mountPath: /mnt/fileserver name: fileserver-dataAnd still I get:
` PermissionError: [Errno 13] Permission denied: '/mnt/fileserver/whatever/metrics/model_config/text_summary'

[2021-11-21 18:04:29,017] [7] [ERROR] [fileserver] Exception on / [POST]
Traceback (most recent call last):
File "/usr/lib64/python3.6/pathlib.py", line 1248, in mkdir
self._accessor.mkdir(self, mode)
File "/usr/lib64/python3.6/pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: '/mnt/fileserver/Training RoBERTa Language Model whatever/metrics/args/text_summary' How can it be? The docker execution user has no permission to access the folder? should I add securityContext ` or something?

  
  
Posted 3 years ago
Votes Newest

Answers 24


I’m just trying to understand of it’s something related ceph or clearml deployment

  
  
Posted 3 years ago

wow

  
  
Posted 3 years ago

Hi PleasantGiraffe85 , did you use the helm chart from https://github.com/allegroai/clearml-helm-charts/ ?

  
  
Posted 3 years ago

hmm... the volume is already attached - already used by clearml-fileserver ... so it fails on this

  
  
Posted 3 years ago

it’s pretty strange to me about the fact you can’t write on it

  
  
Posted 3 years ago

That's very strange....

  
  
Posted 3 years ago

I wonder what did we do to reach it, though... Could be we flooded it at some point.

  
  
Posted 3 years ago

Also, which storage class are you using?

  
  
Posted 3 years ago

pretty weird; I have some issues with ceph in the past but never something like that

  
  
Posted 3 years ago

The version we're using is: 1.1.1-135 • 1.1.1 • 2.14

  
  
Posted 3 years ago

ya sure, I was referring to. create a new PVC just for the test

  
  
Posted 3 years ago

(Multi-Attach error for volume)

  
  
Posted 3 years ago

good it’s solved 😄

  
  
Posted 3 years ago

this is interesting

  
  
Posted 3 years ago

Hi SuccessfulKoala55 , thanks for assisting, yes we used the helm to install it. It isn't the latest version though. We installed it a month or two ago.

  
  
Posted 3 years ago

and the storage class name (I hope that what you meant, SuccessfulKoala55 ) is ceph-c2-prod-rz01-cephfs

  
  
Posted 3 years ago

The Storage provisioner is: http://cephfs.csi.ceph.com

  
  
Posted 3 years ago

Yes, one interesting info would be: what dynamic storage provisioner are you using? (storageclass)

  
  
Posted 3 years ago

did you tried to create a debug pod with a mount using ceph storageclass? you can start from here https://downey.io/notes/dev/ubuntu-sleep-pod-yaml/ then add the pvc and the mount. then you should exec into the pod and try to write a dummy file on the mount; I suspect the problem is there

  
  
Posted 3 years ago

PleasantGiraffe85 is there any change in the PVC from your version to the current version?

  
  
Posted 3 years ago

Thank you both so much for the efforts to fix it 🙂
One of my colleagues ran once some training, with tons of data in the git folder which was not .gitignored - so I suspect it's related to this.

  
  
Posted 3 years ago

I'll continue reporting if it happens again

  
  
Posted 3 years ago

JuicyFox94 do you have any idea?

  
  
Posted 3 years ago

I have also no idea how it happened.
I managed to redeploy it and it seems to be accessible now

  
  
Posted 3 years ago
865 Views
24 Answers
3 years ago
one year ago
Tags