Reputation
Badges 1
62 × Eureka!@<1523701118159294464:profile|ExasperatedCrab78> do you have any inputs for this one? 🙂
@<1523701205467926528:profile|AgitatedDove14> If you have any other insights, pls do not hesitate! Thanks a lot
On the helm charts clearml repos, can we use the clearml-serving chart alone ?
@<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14> Any ideas on this one?
Is it a bug inside the AWS autoscaler??
No problem. I guess this might be a small visualisation bug, but I really have the impression that these workers still pick up tasks, which is strange. I should test again to be sure.
@<1523701070390366208:profile|CostlyOstrich36> The base docker image of the AWS autoscaler is nvidia/cuda:10.2-runtime-ubuntu18.04
. According to me, the python version is not set inside the image, but I am might be wrong and it could be the problem indeed... ?
Thanks ! So regarding question2, it means that I can spin up a K8s cluster with triton enabled, and by specifiying the type of model while creating the endpoint, it will use or not the triton engine.
Linked to that, Is the triton engine expecting the tensorrt
format or is it just an improvement step compared to other model weights ?
Finally, last question ( I swear 😛 ) : How is the serving on Kubernetes flow supposed to look like? Is it something like that:
- Create en...
I do not remember, but I was afraid.... Thanks for the output ! Maybe in a bad dream ? 😜
I still do not get the K8s clearml server usefulness of it then?
I read that, the hosted clearml server was periodically resetted. Does it mean my team would lose all our work?
Task.set_base_docker
🙂
Thank you for the quick replies!
I might do it the wrong way but the above snippet of code is the additional clearml.conf
file I add to the AWS autoscaler. Should I add a complete clearml.conf file to it?
That is a good question @<1537605940121964544:profile|EnthusiasticShrimp49> ! I am not sure the image has python 3.9. I tried to check it but did not find the answer. I am using the following AMI: AWS Deep Learning AMI (Ubuntu 18.04) with Support by Terracloudx
(Nvidia deep learni...
I guess it makes no sense because of the steps a clearml-agent works...
I also thought about going to pip
mode but not all packages are detected from our poetry.lock file unfortunately so cannot do that.
This is really extremely hard to debug. I am thinking to create another repo and iterate on the packages to hopefully find the problem, but it will take ages.
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
I literrally connected to it at runtime, and ran poetry install -n
and it worked
It is due to the caching mechanism of Clearml. Is there a python command to update the venvs-cache?
Hi @<1523701087100473344:profile|SuccessfulKoala55> , the EC2 instance is spinned-up from the AWS autoscaler provided by ClearML. I use this following docker image: nvidia/cuda:11.8.0-devel-ubuntu20.0
So the EC2 instance runs a docker container
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?
Hey @<1523701205467926528:profile|AgitatedDove14> , thank you for your input
Could you clarify what you mean by clearml-serving session?
Are you refering to the servingTaskId ?
I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.
Thank you! I will try this 🙂
Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.
I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?
Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?
Great, and can we specify an environment variable of ClearML that directly updates the clearml.conf file regarding the azure config or do something similar. I do not want to ask every engineer of my team to modify its clearml.conf file? @<1523701070390366208:profile|CostlyOstrich36> Thanks
In production, we should use the clearml-helm-charts
right? Docker-compose in the clearml-serving is more for local testing