@<1556812486840160256:profile|SuccessfulRaven86> , to make things easier to debug, can you try running the agent locally?
And I just tried with Python 3.8 (default version of the image) and it still fails.
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.8/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.8/task_repository/clearmldebug.git/.venv
2023-04-18 15:03:52
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operations: 6 installs, 0 updates, 0 removals
failed installing poetry requirements: Command '['poetry', 'install', '-n', '-v']' returned non-zero exit status 1.
When the task finally failed, I was kicked of from the container
Because I was ssh-ing to it before the fail. When poetry fails, it installs everything using PIP
Yes should be correct. Inside the bash script of the task.
@<1523701087100473344:profile|SuccessfulKoala55> Do you think it is possible to ask to run docker mode in the aws autoscaler, and add the cloning and installation inside the init bash script of the task?
I see it's running inside 3.9, so I assume it's correct
but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there
How do you explain that it works when I ssh-ed into the same AWS container instance from the autoscaler?
How is it still up is the task failed?
@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I tried with dummy repo. Using Python and stripe packages ONLY in the pyproject.toml
Here is my result (still failing) :
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/.venv
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operations: 6 installs, 0 updates, 0 removals
failed installing poetry requirements: Command '['poetry', 'install', '-n', '-v']' returned non-zero exit status 1.
Ignoring pip: markers 'python_version >= "3.10"' don't match your environment
The autoscaler just runs it on an AWS instance, inside a docker container - there's no difference from running it yourself inside a docker container - did you try running it inside a docker container as well?
You can theoretically do that in the docker init bash script that will be executed before the task is cloned and run
I tried too. I do not have more logs inside the ClearML agent 😞
I think you should try to manually start such a docker container and try to see what fails in the process. Attaching to an existing one has too many differences already
I literrally connected to it at runtime, and ran poetry install -n
and it worked
Yes, the problem is it's still really hidden (the error, I mean)
I am currently trying with a new dummy repo and I iterate over the dependencies of the pyproject.toml.
How to make sure that the python version is correct?
This is really extremely hard to debug. I am thinking to create another repo and iterate on the packages to hopefully find the problem, but it will take ages.
My issue has been resolved going with pip.
and are you sure these are the same env vars available when the agent does the same?
It just allows me to have access to poetry and python installed on hte container
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?
@<1556812486840160256:profile|SuccessfulRaven86> can you try with -vvv
instead of -v
?
the autoscaler always uses docker mode