
Reputation
Badges 1
53 × Eureka!if yopu instruct apiserver to use s3 fileserver will not basically used anymore (I need SuccessfulKoala55 confirmation to be 100% sure, Im more infra guy :D )
ok, for mayor version upgrade my suggestion is to backup the data somewhere and do a clean install after removing the pvc/pv
you will need to upgrade clearml helm chart
I absolutely need to improve the persistence part of this chart π
This is specific K8s infra management, usually I use Velero for backup
there are workarounds tbh but they are tricks that require a lot of k8s espertise and they are risky
but I will try to find something good for you
not sure if this is ok with your infra
so you are using docker-compose?
but Iβm going to patch this soon so it will take the default storageclass automatically
some suggestions:
start working just with clearml (no agent or serving, these ones will go in after clearml is working) try a fist deploy without any override if it works start adding values to override file (without reporting everything or it will be very difficult to debug, you should not report on override file what is not overridden) do helm upgrade check problems one by one
Just a quick suggestion since I have some more insight on the situation. Maybe you can look at Velero, it should be able to migrate data. If not you can simply create a new fresh install, scale everything to zero, then create some debug pod mounting old and new pvc and copy data between the two. More complex to say it than do it.
What Chart version are you trying to upgrade from?
Hi Ofir, ty for feedback
today it's pretty busy for me but I can try to help if needed, pls put any question here if you have and I will try to answer when possible
With Helm we are not running in service-mode. If pod get evicted or killed we should investigate what is the reason behind that; there are any logs on kille dpod that can help us understand better the situation?
this is the state of the cluster https://github.com/valeriano-manassero/mlops-k8s-infra
moreover if you are using minikube you can take a try on official helm chart https://github.com/allegroai/clearml-server-helm
if you need a not automated way to create the cluster I suggest to take in consideration helm chart only.
I β m not totally sure atm but you can try to set env var CLEARML_API_HOST_VERIFY_CERT="false"
just my two cents
I suggest to exec into the pod and issue the command kubectl delete pod -l=CLEARML=agent-74b23a8f --namespace=clearml --field-selector=status.phase!=Pending,status.phase!=Running --output name
sp you can see the ouput from inside the pod. This should help understand what is going on with the command
Just to be sure we are in sync π
Ofc itβs possible to add this to the chart but, as @<1523701205467926528:profile|AgitatedDove14> said, itβs not recommended to go directly over public internet with it. Regardless of this, @<1556812486840160256:profile|SuccessfulRaven86> do you have any PR to propose for it? It would be great to have something to discuss on in GH.
iptables is used by docker itself so you need to be careful on doing mods: https://docs.docker.com/network/packet-filtering-firewalls/