
Reputation
Badges 1
53 × Eureka!kubectl get svc -n clearml
?
about minor releases they are not breaking so it should be linear
it should be ok, for dependency charts (mongodb/elastic/redis) you need to check values by the owner (link is in valus.yaml sections)
In this case I apologize for confusion. If you are going for AWS autoscaler it's better to follow official way to go, the solution I proposed is for an onpremise cluster containing every componenet without autoscaler. sorry for
moreover if you are using minikube you can take a try on official helm chart https://github.com/allegroai/clearml-server-helm
this is the state of the cluster https://github.com/valeriano-manassero/mlops-k8s-infra
this is the chart with various group of agents configurable https://artifacthub.io/packages/helm/valeriano-manassero/clearml
clearml-agent is a pretty new chart so I expect some issues. Can you pls open an issue on Github for each problem you found?
it will be easier for me to track fixes
yes, it should be, will test this specific behaviour to be sure
this is a clear issue with provisioner not handling the pvc request for any pod having a pvc. It’s not related chart but provisioner you are suing that probably doesn’t support dynamic allocation. what provisioner are you using?
it looks to me redis pod is not working as expected, it’s just a guess
so do you want to mount files into agent pod?
(and any queue has it’s only basepodtemplate)
Just one more info: atm I tested Elastic v7.10.* . I still didn't tested 7.11-7.12-7.13
3.10.2 will be published in 30 minutes
pretty weird; I have some issues with ceph in the past but never something like that
from /
to /debug.ping
I’m just trying to understand of it’s something related ceph or clearml deployment
I'm going to ask an update to docs
There’s an incomplete PR for this None .
some suggestions:
start working just with clearml (no agent or serving, these ones will go in after clearml is working) try a fist deploy without any override if it works start adding values to override file (without reporting everything or it will be very difficult to debug, you should not report on override file what is not overridden) do helm upgrade check problems one by one
I should add some more instruction on Github page
if yopu instruct apiserver to use s3 fileserver will not basically used anymore (I need SuccessfulKoala55 confirmation to be 100% sure, Im more infra guy :D )
I don’t think it’s possible to setup queues in advance with any ClearML chart env var but I’m not 100% sure. SuccessfulKoala55 can you pls clarify this?
especially if it’s evicted, it should be due increasing resource usage
Ty, I have other stuff that I'd like to send but it's better to get these eventually merged first so I can proceed to shiny news PR in the near future 😄