Reputation
Badges 1
62 × Eureka!Task.set_base_docker
🙂
Related to that,
Is it possible to do Dataset.add_external_files() with source_url and destination_url being two separate azure storage containers?
How to set that up inside clearml.conf or something else to know which credentials to load?
Great, and can we specify an environment variable of ClearML that directly updates the clearml.conf file regarding the azure config or do something similar. I do not want to ask every engineer of my team to modify its clearml.conf file? @<1523701070390366208:profile|CostlyOstrich36> Thanks
Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.
I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.
In production, we should use the clearml-helm-charts
right? Docker-compose in the clearml-serving is more for local testing
I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?
Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?
Thanks, my question is dumb indeed 🙂 Thanks for the reply !
Thank you! I will try this 🙂
These changes reflect the modifications I have in my staging area (not commited, not put in staging area with git add
) But I would like to remove this uncommited section from clearml and not be blocked by it
One possible solution I could see as well, is putting the data storage to S3 bucket to improve download performance as it is the same cloud provider. No transfer latency.
I guess it makes no sense because of the steps a clearml-agent works...
I also thought about going to pip
mode but not all packages are detected from our poetry.lock file unfortunately so cannot do that.
@<1523701070390366208:profile|CostlyOstrich36> poetry is installed as part of the bash script of the task.
The init script of the AWS autoscaler only contains three export variables I set.
Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?
I literrally connected to it at runtime, and ran poetry install -n
and it worked
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
Ok. I spinned up three AWS autoscalers, each with different conf. I also fixed a submodule issue in my repo (which I was believing was the problem of the git diff) and every run now passes and fails after (not this problem). So I think store_code_diff_from_remote
is of no help from me but my problem is gone...
It is due to the caching mechanism of Clearml. Is there a python command to update the venvs-cache?
@<1523701070390366208:profile|CostlyOstrich36> The base docker image of the AWS autoscaler is nvidia/cuda:10.2-runtime-ubuntu18.04
. According to me, the python version is not set inside the image, but I am might be wrong and it could be the problem indeed... ?
Hi @<1523701087100473344:profile|SuccessfulKoala55> , the EC2 instance is spinned-up from the AWS autoscaler provided by ClearML. I use this following docker image: nvidia/cuda:11.8.0-devel-ubuntu20.0
So the EC2 instance runs a docker container
@<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14> Any ideas on this one?
Thank you for the quick replies!
I might do it the wrong way but the above snippet of code is the additional clearml.conf
file I add to the AWS autoscaler. Should I add a complete clearml.conf file to it?
That is a good question @<1537605940121964544:profile|EnthusiasticShrimp49> ! I am not sure the image has python 3.9. I tried to check it but did not find the answer. I am using the following AMI: AWS Deep Learning AMI (Ubuntu 18.04) with Support by Terracloudx
(Nvidia deep learni...
My issue has been resolved going with pip.
Okey thanks @<1523701205467926528:profile|AgitatedDove14> and what would be the advantage of using clearm-server on k8s compared to the clearml hosted one?
I read that, the hosted clearml server was periodically resetted. Does it mean my team would lose all our work?