![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/SuccessfulRaven86.png)
Reputation
Badges 1
62 × Eureka!How to make sure that the python version is correct?
It just allows me to have access to poetry and python installed on hte container
@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I tried with dummy repo. Using Python and stripe packages ONLY in the pyproject.toml
Here is my result (still failing) :
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/...
@<1523701118159294464:profile|ExasperatedCrab78> do you have any inputs for this one? 🙂
I also did that in the following way:
- I put a sleep inside the bash script
- I ssh-ed to the fresh container and did all commands myself (cloning, installation) and again it worked...
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
When the task finally failed, I was kicked of from the container
I am currently trying with a new dummy repo and I iterate over the dependencies of the pyproject.toml.
The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?
Task.set_base_docker
🙂
This is really extremely hard to debug. I am thinking to create another repo and iterate on the packages to hopefully find the problem, but it will take ages.
I literrally connected to it at runtime, and ran poetry install -n
and it worked
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
Related to that,
Is it possible to do Dataset.add_external_files() with source_url and destination_url being two separate azure storage containers?
How do you explain that it works when I ssh-ed into the same AWS container instance from the autoscaler?
Because I was ssh-ing to it before the fail. When poetry fails, it installs everything using PIP
Is it a bug inside the AWS autoscaler??
@<1523701087100473344:profile|SuccessfulKoala55> Do you think it is possible to ask to run docker mode in the aws autoscaler, and add the cloning and installation inside the init bash script of the task?
Yes I take the export statements from my bash script of the task
Hi @<1523701087100473344:profile|SuccessfulKoala55> , the EC2 instance is spinned-up from the AWS autoscaler provided by ClearML. I use this following docker image: nvidia/cuda:11.8.0-devel-ubuntu20.0
So the EC2 instance runs a docker container
I do not remember, but I was afraid.... Thanks for the output ! Maybe in a bad dream ? 😜
On the helm charts clearml repos, can we use the clearml-serving chart alone ?
I will check that. Do you think we could bypass it using Task.create
? And passing all the needed params?
I have my Task.init
inside a train() function inside the flask command. We basically have flask commands allowing to trigger specific behaviors. When running it locally, everything works properly except the repository information. The use case is linked to the way our codebase works. For example, I am going to do flask train {arguments}
and it will trigger the training of a model (that I want to track).
I stopped the autoscaler and deleted it manually. I did it because I want to test...
These changes reflect the modifications I have in my staging area (not commited, not put in staging area with git add
) But I would like to remove this uncommited section from clearml and not be blocked by it
One possible solution I could see as well, is putting the data storage to S3 bucket to improve download performance as it is the same cloud provider. No transfer latency.
Yes should be correct. Inside the bash script of the task.
I tried too. I do not have more logs inside the ClearML agent 😞