
Reputation
Badges 1
62 × Eureka!On the helm charts clearml repos, can we use the clearml-serving chart alone ?
How to set that up inside clearml.conf or something else to know which credentials to load?
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
@<1523701070390366208:profile|CostlyOstrich36> The base docker image of the AWS autoscaler is nvidia/cuda:10.2-runtime-ubuntu18.04
. According to me, the python version is not set inside the image, but I am might be wrong and it could be the problem indeed... ?
Thank you for the quick replies!
I might do it the wrong way but the above snippet of code is the additional clearml.conf
file I add to the AWS autoscaler. Should I add a complete clearml.conf file to it?
That is a good question @<1537605940121964544:profile|EnthusiasticShrimp49> ! I am not sure the image has python 3.9. I tried to check it but did not find the answer. I am using the following AMI: AWS Deep Learning AMI (Ubuntu 18.04) with Support by Terracloudx
(Nvidia deep learni...
Related to that,
Is it possible to do Dataset.add_external_files() with source_url and destination_url being two separate azure storage containers?
Thanks ! So regarding question2, it means that I can spin up a K8s cluster with triton enabled, and by specifiying the type of model while creating the endpoint, it will use or not the triton engine.
Linked to that, Is the triton engine expecting the tensorrt
format or is it just an improvement step compared to other model weights ?
Finally, last question ( I swear 😛 ) : How is the serving on Kubernetes flow supposed to look like? Is it something like that:
- Create en...
I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.
Thanks, my question is dumb indeed 🙂 Thanks for the reply !
I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?
Hi @<1523701087100473344:profile|SuccessfulKoala55> , the EC2 instance is spinned-up from the AWS autoscaler provided by ClearML. I use this following docker image: nvidia/cuda:11.8.0-devel-ubuntu20.0
So the EC2 instance runs a docker container
If I may ask as well for another issue in that thread that is taking me a big amount of time:
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv alfred-Rp77Shgw-py3.9 in /root/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
2023-04-17 10:17:57
Package operations: 351 installs, 1 update, 1 removal
failed installing poetry requirements: Command '['poetry', 'install', '-n']' returned non-zero exit status 1.
Ignorin...
These changes reflect the modifications I have in my staging area (not commited, not put in staging area with git add
) But I would like to remove this uncommited section from clearml and not be blocked by it
One possible solution I could see as well, is putting the data storage to S3 bucket to improve download performance as it is the same cloud provider. No transfer latency.
Sure, here is the updated clearml.conf file of the AWS autoscaler instance:
agent {
vcs_cache.enabled: false
package_manager: {
type: poetry,
poetry_version: "1.4.2",
}
}
sdk {
development {
store_code_diff_from_remote: false,
}
}
I see uncommited changes, where as I would like to have nothing.
Ok. I spinned up three AWS autoscalers, each with different conf. I also fixed a submodule issue in my repo (which I was believing was the problem of the git diff) and every run now passes and fails after (not this problem). So I think store_code_diff_from_remote
is of no help from me but my problem is gone...
Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?
Yes I take the export statements from my bash script of the task
This is really extremely hard to debug. I am thinking to create another repo and iterate on the packages to hopefully find the problem, but it will take ages.
@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I tried with dummy repo. Using Python and stripe packages ONLY in the pyproject.toml
Here is my result (still failing) :
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/...
Yes should be correct. Inside the bash script of the task.
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
Hey @<1523701205467926528:profile|AgitatedDove14> , thank you for your input
Could you clarify what you mean by clearml-serving session?
Are you refering to the servingTaskId ?
I tried too. I do not have more logs inside the ClearML agent 😞
The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?
Thank you! I will try this 🙂
Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?
In production, we should use the clearml-helm-charts
right? Docker-compose in the clearml-serving is more for local testing
I will check that. Do you think we could bypass it using Task.create
? And passing all the needed params?