Reputation
Badges 1
62 × Eureka!I tried too. I do not have more logs inside the ClearML agent 😞
I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.
Because I was ssh-ing to it before the fail. When poetry fails, it installs everything using PIP
Thanks, my question is dumb indeed 🙂 Thanks for the reply !
Thank you for the quick replies!
I might do it the wrong way but the above snippet of code is the additional clearml.conf file I add to the AWS autoscaler. Should I add a complete clearml.conf file to it?
That is a good question @<1537605940121964544:profile|EnthusiasticShrimp49> ! I am not sure the image has python 3.9. I tried to check it but did not find the answer. I am using the following AMI: AWS Deep Learning AMI (Ubuntu 18.04) with Support by Terracloudx (Nvidia deep learni...
I am currently trying with a new dummy repo and I iterate over the dependencies of the pyproject.toml.
When the task finally failed, I was kicked of from the container
I have my Task.init inside a train() function inside the flask command. We basically have flask commands allowing to trigger specific behaviors. When running it locally, everything works properly except the repository information. The use case is linked to the way our codebase works. For example, I am going to do flask train {arguments} and it will trigger the training of a model (that I want to track).
I stopped the autoscaler and deleted it manually. I did it because I want to test...
I do not remember, but I was afraid.... Thanks for the output ! Maybe in a bad dream ? 😜
@<1523701070390366208:profile|CostlyOstrich36> The base docker image of the AWS autoscaler is nvidia/cuda:10.2-runtime-ubuntu18.04 . According to me, the python version is not set inside the image, but I am might be wrong and it could be the problem indeed... ?
@<1523701118159294464:profile|ExasperatedCrab78> do you have any inputs for this one? 🙂
Related to that,
Is it possible to do Dataset.add_external_files() with source_url and destination_url being two separate azure storage containers?
Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?
In production, we should use the clearml-helm-charts right? Docker-compose in the clearml-serving is more for local testing
Is it a bug inside the AWS autoscaler??
It is due to the caching mechanism of Clearml. Is there a python command to update the venvs-cache?
I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?
These changes reflect the modifications I have in my staging area (not commited, not put in staging area with git add ) But I would like to remove this uncommited section from clearml and not be blocked by it
Thanks ! So regarding question2, it means that I can spin up a K8s cluster with triton enabled, and by specifiying the type of model while creating the endpoint, it will use or not the triton engine.
Linked to that, Is the triton engine expecting the tensorrt format or is it just an improvement step compared to other model weights ?
Finally, last question ( I swear 😛 ) : How is the serving on Kubernetes flow supposed to look like? Is it something like that:
- Create en...
Great, and can we specify an environment variable of ClearML that directly updates the clearml.conf file regarding the azure config or do something similar. I do not want to ask every engineer of my team to modify its clearml.conf file? @<1523701070390366208:profile|CostlyOstrich36> Thanks
How do you explain that it works when I ssh-ed into the same AWS container instance from the autoscaler?
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
Sure, here is the updated clearml.conf file of the AWS autoscaler instance:
agent {
vcs_cache.enabled: false
package_manager: {
type: poetry,
poetry_version: "1.4.2",
}
}
sdk {
development {
store_code_diff_from_remote: false,
}
}
I see uncommited changes, where as I would like to have nothing.
Thank you! I will try this 🙂
@<1523701205467926528:profile|AgitatedDove14> If you have any other insights, pls do not hesitate! Thanks a lot
@<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14> Any ideas on this one?
One possible solution I could see as well, is putting the data storage to S3 bucket to improve download performance as it is the same cloud provider. No transfer latency.
Yes should be correct. Inside the bash script of the task.
Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?