
Reputation
Badges 1
62 × Eureka!I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.
These changes reflect the modifications I have in my staging area (not commited, not put in staging area with git add
) But I would like to remove this uncommited section from clearml and not be blocked by it
For now, I am uploading to the basic-available ClearML server to store my data. But I will soon use S3 buckets to store data. So the question is for both use cases 🙂
Yes I take the export statements from my bash script of the task
I am currently trying with a new dummy repo and I iterate over the dependencies of the pyproject.toml.
Is it a bug inside the AWS autoscaler??
When the task finally failed, I was kicked of from the container
And I just tried with Python 3.8 (default version of the image) and it still fails.
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.8/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.8/task_repository/clearmldebug.git/.venv
2023-04-18 15:03:52
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operation...
How to make sure that the python version is correct?
I guess it makes no sense because of the steps a clearml-agent works...
I also thought about going to pip
mode but not all packages are detected from our poetry.lock file unfortunately so cannot do that.
Task.set_base_docker
🙂
Okey thanks @<1523701205467926528:profile|AgitatedDove14> and what would be the advantage of using clearm-server on k8s compared to the clearml hosted one?
@<1523701087100473344:profile|SuccessfulKoala55> Do you think it is possible to ask to run docker mode in the aws autoscaler, and add the cloning and installation inside the init bash script of the task?
Ok. I spinned up three AWS autoscalers, each with different conf. I also fixed a submodule issue in my repo (which I was believing was the problem of the git diff) and every run now passes and fails after (not this problem). So I think store_code_diff_from_remote
is of no help from me but my problem is gone...
@<1523701070390366208:profile|CostlyOstrich36> poetry is installed as part of the bash script of the task.
The init script of the AWS autoscaler only contains three export variables I set.
It just allows me to have access to poetry and python installed on hte container
How do you explain that it works when I ssh-ed into the same AWS container instance from the autoscaler?
My issue has been resolved going with pip.
Because I was ssh-ing to it before the fail. When poetry fails, it installs everything using PIP
I do not remember, but I was afraid.... Thanks for the output ! Maybe in a bad dream ? 😜
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?
I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?
In production, we should use the clearml-helm-charts
right? Docker-compose in the clearml-serving is more for local testing
Related to that,
Is it possible to do Dataset.add_external_files() with source_url and destination_url being two separate azure storage containers?
Thank you for the quick replies!
I might do it the wrong way but the above snippet of code is the additional clearml.conf
file I add to the AWS autoscaler. Should I add a complete clearml.conf file to it?
That is a good question @<1537605940121964544:profile|EnthusiasticShrimp49> ! I am not sure the image has python 3.9. I tried to check it but did not find the answer. I am using the following AMI: AWS Deep Learning AMI (Ubuntu 18.04) with Support by Terracloudx
(Nvidia deep learni...
This is really extremely hard to debug. I am thinking to create another repo and iterate on the packages to hopefully find the problem, but it will take ages.
Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?