
Reputation
Badges 1
62 × Eureka!How to make sure that the python version is correct?
Okey thanks @<1523701205467926528:profile|AgitatedDove14> and what would be the advantage of using clearm-server on k8s compared to the clearml hosted one?
Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.
If I may ask as well for another issue in that thread that is taking me a big amount of time:
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv alfred-Rp77Shgw-py3.9 in /root/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
2023-04-17 10:17:57
Package operations: 351 installs, 1 update, 1 removal
failed installing poetry requirements: Command '['poetry', 'install', '-n']' returned non-zero exit status 1.
Ignorin...
Ok. I spinned up three AWS autoscalers, each with different conf. I also fixed a submodule issue in my repo (which I was believing was the problem of the git diff) and every run now passes and fails after (not this problem). So I think store_code_diff_from_remote
is of no help from me but my problem is gone...
Task.set_base_docker
🙂
Using a pyenv virtual env then exporting LOCALPYTHON env var
For now, I am uploading to the basic-available ClearML server to store my data. But I will soon use S3 buckets to store data. So the question is for both use cases 🙂
In production, we should use the clearml-helm-charts
right? Docker-compose in the clearml-serving is more for local testing
I have my Task.init
inside a train() function inside the flask command. We basically have flask commands allowing to trigger specific behaviors. When running it locally, everything works properly except the repository information. The use case is linked to the way our codebase works. For example, I am going to do flask train {arguments}
and it will trigger the training of a model (that I want to track).
I stopped the autoscaler and deleted it manually. I did it because I want to test...
And I just tried with Python 3.8 (default version of the image) and it still fails.
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.8/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.8/task_repository/clearmldebug.git/.venv
2023-04-18 15:03:52
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operation...
@<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14> Any ideas on this one?
Thank you for the quick replies!
I might do it the wrong way but the above snippet of code is the additional clearml.conf
file I add to the AWS autoscaler. Should I add a complete clearml.conf file to it?
That is a good question @<1537605940121964544:profile|EnthusiasticShrimp49> ! I am not sure the image has python 3.9. I tried to check it but did not find the answer. I am using the following AMI: AWS Deep Learning AMI (Ubuntu 18.04) with Support by Terracloudx
(Nvidia deep learni...
@<1523701118159294464:profile|ExasperatedCrab78> do you have any inputs for this one? 🙂
My issue has been resolved going with pip.
These changes reflect the modifications I have in my staging area (not commited, not put in staging area with git add
) But I would like to remove this uncommited section from clearml and not be blocked by it
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
How do you explain that it works when I ssh-ed into the same AWS container instance from the autoscaler?
I do not remember, but I was afraid.... Thanks for the output ! Maybe in a bad dream ? 😜
Thank you! I will try this 🙂
@<1523701205467926528:profile|AgitatedDove14> If you have any other insights, pls do not hesitate! Thanks a lot
I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?
Is it a bug inside the AWS autoscaler??
I will check that. Do you think we could bypass it using Task.create
? And passing all the needed params?
Great, and can we specify an environment variable of ClearML that directly updates the clearml.conf file regarding the azure config or do something similar. I do not want to ask every engineer of my team to modify its clearml.conf file? @<1523701070390366208:profile|CostlyOstrich36> Thanks
Related to that,
Is it possible to do Dataset.add_external_files() with source_url and destination_url being two separate azure storage containers?
How to set that up inside clearml.conf or something else to know which credentials to load?
The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?
Hey @<1523701205467926528:profile|AgitatedDove14> , thank you for your input
Could you clarify what you mean by clearml-serving session?
Are you refering to the servingTaskId ?