Reputation
Badges 1
53 × Eureka!uh, using clearml-task params 😄
so do you want to mount files into agent pod?
ok, for mayor version upgrade my suggestion is to backup the data somewhere and do a clean install after removing the pvc/pv
there are processes listening on host ports?
and add these 3 hostnames pointing them to the external ip
(just to understand where are the ingress rules)
did you tried to create a debug pod with a mount using ceph storageclass? you can start from here https://downey.io/notes/dev/ubuntu-sleep-pod-yaml/ then add the pvc and the mount. then you should exec into the pod and try to write a dummy file on the mount; I suspect the problem is there
this is strange, I have a lot of clusters that went trough nodes issue but I never lost data
Just one more info: atm I tested Elastic v7.10.* . I still didn't tested 7.11-7.12-7.13
btw, judging from screenshots services are ok but pod are not up, especially elastic, redis and mongodb are Pending so it means k8s didn’t scheduled them for some reason you can find describing these pods
about clearml-agent, just set resources in basepodtemplate (cpu gpu ram) so you will have a specific definition
what kind of storageclass are you using on this one?
ok so they are executed as expected
at task completion do you get state Completed in UI?
About nodeSelector you are right, one is for the agent pod while the other is used to spawn task pods
if yopu instruct apiserver to use s3 fileserver will not basically used anymore (I need SuccessfulKoala55 confirmation to be 100% sure, Im more infra guy :D )
what kin d of Clearml installation you did on machine? there are processes listening on these ports?
mmmmm should not be related chart as far as I know, I’m going to ping SuccessfulKoala55 ; maybe he can chime in because I’m not sure why it’s happening
you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated
but it;s just a quick guess, not sure if i’m right
this is the state of the cluster https://github.com/valeriano-manassero/mlops-k8s-infra
Exactly, these are system accounts
really weird, can you try to totally remove any cookie domain related?
there’s anything specific you need?
ok next step is to exec into mongodb pod and check the various docscollections(tables) to see if data is still there
basically a new helm chart 😄
` ❯ clearml-task --version
ClearML launch - launch any codebase on remote machine running clearml-agent
usage: clearml-task [-h] [--version] [--project PROJECT] --name NAME [--repo REPO] [--branch BRANCH]
[--commit COMMIT] [--folder FOLDER] [--script SCRIPT] [--cwd CWD] [--args [ARGS [ARGS ...]]]
[--queue QUEUE] [--requirements REQUIREMENTS] [--packages [PACKAGES [PACKAGES ...]]]
[--docker DOCKER] [--docker_args DOCKER_ARGS]
...
ya sure, I was referring to. create a new PVC just for the test
O k, I’d like to test it more with you; credentials exposed in chart values are system ones and it’s better to not change them; let’s forget about them for now. If you create a new accesskey/secretkey pair in ui, you should use these ones in your agents and they shuld not get overwritten in any way; can you confirm it works without touching credentials section?