Hi DeliciousBluewhale87 .
Which cloud provider are you using? If AWS, you can use the https://allegro.ai/clearml/docs/docs/examples/services/aws_autoscaler/aws_autoscaler.html#clearml-aws-autoscaler-service .
Can this do the trick?
We have to do it in-premise.. Cloud providers are not allowed for the final implementation. Of course, now we use Cloud to test out our ideas.
Hi DeliciousBluewhale87 , I'm already using an on-premise config (with GitOps paradigm) using a custom helm chart. maybe this is interesting for you
this is the chart with various group of agents configurable https://artifacthub.io/packages/helm/valeriano-manassero/clearml
this is the state of the cluster https://github.com/valeriano-manassero/mlops-k8s-infra
Thanks JuicyFox94 .
Not really from devops background, Let me try to digest this.. 🙏
today it's pretty busy for me but I can try to help if needed, pls put any question here if you have and I will try to answer when possible
sure, I'll post some questions once I wrap my mind around it..
if you need a not automated way to create the cluster I suggest to take in consideration helm chart only.
Hi DeliciousBluewhale87 ,
As far as I know, JuicyFox94 ’s charts do not yet deal with dynamic scaling of ClearML Agents ( JuicyFox94 feel free to correct me 🙂 )
This is currently supported in the AWS Auto-Scaler (which is both a working implementation and an example template on how to accomplish such a auto-scaler, regardless of the platform used).
We do have plans to support this kind of scaling for K8s in the near future 🙂
SuccessfulKoala55 yes, no autoscaler on that chart. Maybe I'm missing the point but the request was for an "on-premise" setup so I guessed no aws. If I missed the point everything I posted is not useful 😄
Hi SuccessfulKoala55 , kkie..
1)Actually, now i am using AWS. I am trying to set up Clearml server in K8. However, clearml-agents will be just another ec2-instance/docker image.
2) For phase 2, I will try Clearml AWS AutoScaler Service.
3) At this point, I think I will have a crack at JuicyFox94 's solution as well.
Our main goal, maybe I shld have stated prior. We are data scientists who need a mlops environment to track and also run our experiments..
Just to add on, I am using minikube now.
In this case I apologize for confusion. If you are going for AWS autoscaler it's better to follow official way to go, the solution I proposed is for an onpremise cluster containing every componenet without autoscaler. sorry for
moreover if you are using minikube you can take a try on official helm chart https://github.com/allegroai/clearml-server-helm
nice.. this looks a bit friendly.. 🙂 .. Let me try it.. Thanks