
Reputation
Badges 1
62 × Eureka!On the helm charts clearml repos, can we use the clearml-serving chart alone ?
Hi @<1523701087100473344:profile|SuccessfulKoala55> , the EC2 instance is spinned-up from the AWS autoscaler provided by ClearML. I use this following docker image: nvidia/cuda:11.8.0-devel-ubuntu20.0
So the EC2 instance runs a docker container
I still do not get the K8s clearml server usefulness of it then?
How to set that up inside clearml.conf or something else to know which credentials to load?
Great, and can we specify an environment variable of ClearML that directly updates the clearml.conf file regarding the azure config or do something similar. I do not want to ask every engineer of my team to modify its clearml.conf file? @<1523701070390366208:profile|CostlyOstrich36> Thanks
Thanks, my question is dumb indeed 🙂 Thanks for the reply !
I have my Task.init
inside a train() function inside the flask command. We basically have flask commands allowing to trigger specific behaviors. When running it locally, everything works properly except the repository information. The use case is linked to the way our codebase works. For example, I am going to do flask train {arguments}
and it will trigger the training of a model (that I want to track).
I stopped the autoscaler and deleted it manually. I did it because I want to test...
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
I literrally connected to it at runtime, and ran poetry install -n
and it worked
I tried too. I do not have more logs inside the ClearML agent 😞
I read that, the hosted clearml server was periodically resetted. Does it mean my team would lose all our work?
No problem. I guess this might be a small visualisation bug, but I really have the impression that these workers still pick up tasks, which is strange. I should test again to be sure.
@<1523701205467926528:profile|AgitatedDove14> If you have any other insights, pls do not hesitate! Thanks a lot
Using a pyenv virtual env then exporting LOCALPYTHON env var
Yes should be correct. Inside the bash script of the task.
I tried playing with those, but I do not succeed to have a role on the source code detection. I can modify the env variables, nothing happen on CLearML server unfortunately.
Thank you! I will try this 🙂
It is due to the caching mechanism of Clearml. Is there a python command to update the venvs-cache?
The flask
command is ran inside the git project, which is the strange behavior. It is executed in ~/code/repo/ as flask train ...
Hey @<1523701205467926528:profile|AgitatedDove14> , thank you for your input
Could you clarify what you mean by clearml-serving session?
Are you refering to the servingTaskId ?
I also did that in the following way:
- I put a sleep inside the bash script
- I ssh-ed to the fresh container and did all commands myself (cloning, installation) and again it worked...
I will check that. Do you think we could bypass it using Task.create
? And passing all the needed params?
@<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14> Any ideas on this one?
If I may ask as well for another issue in that thread that is taking me a big amount of time:
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv alfred-Rp77Shgw-py3.9 in /root/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
2023-04-17 10:17:57
Package operations: 351 installs, 1 update, 1 removal
failed installing poetry requirements: Command '['poetry', 'install', '-n']' returned non-zero exit status 1.
Ignorin...
Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.
Sure, here is the updated clearml.conf file of the AWS autoscaler instance:
agent {
vcs_cache.enabled: false
package_manager: {
type: poetry,
poetry_version: "1.4.2",
}
}
sdk {
development {
store_code_diff_from_remote: false,
}
}
I see uncommited changes, where as I would like to have nothing.
Thanks ! So regarding question2, it means that I can spin up a K8s cluster with triton enabled, and by specifiying the type of model while creating the endpoint, it will use or not the triton engine.
Linked to that, Is the triton engine expecting the tensorrt
format or is it just an improvement step compared to other model weights ?
Finally, last question ( I swear 😛 ) : How is the serving on Kubernetes flow supposed to look like? Is it something like that:
- Create en...
@<1523701070390366208:profile|CostlyOstrich36> The base docker image of the AWS autoscaler is nvidia/cuda:10.2-runtime-ubuntu18.04
. According to me, the python version is not set inside the image, but I am might be wrong and it could be the problem indeed... ?
The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?