@<1556812486840160256:profile|SuccessfulRaven86> , to make things easier to debug, can you try running the agent locally?
I literrally connected to it at runtime, and ran poetry install -n and it worked
@<1556812486840160256:profile|SuccessfulRaven86> can you try with -vvv instead of -v ?
You can theoretically do that in the docker init bash script that will be executed before the task is cloned and run
Yes should be correct. Inside the bash script of the task.
I think you should try to manually start such a docker container and try to see what fails in the process. Attaching to an existing one has too many differences already
and are you sure these are the same env vars available when the agent does the same?
I also did that in the following way:
- I put a sleep inside the bash script
- I ssh-ed to the fresh container and did all commands myself (cloning, installation) and again it worked...
I guess it makes no sense because of the steps a clearml-agent works...
I also thought about going to pip mode but not all packages are detected from our poetry.lock file unfortunately so cannot do that.
Yes I take the export statements from my bash script of the task
into the same docker container running the task?
I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug
Is it a bug inside the AWS autoscaler??
@<1523701087100473344:profile|SuccessfulKoala55> Do you think it is possible to ask to run docker mode in the aws autoscaler, and add the cloning and installation inside the init bash script of the task?
the autoscaler always uses docker mode
The autoscaler just runs it on an AWS instance, inside a docker container - there's no difference from running it yourself inside a docker container - did you try running it inside a docker container as well?
Using a pyenv virtual env then exporting LOCALPYTHON env var
I tried too. I do not have more logs inside the ClearML agent 😞
How is it still up is the task failed?
I am currently trying with a new dummy repo and I iterate over the dependencies of the pyproject.toml.
My issue has been resolved going with pip.
Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?
It just allows me to have access to poetry and python installed on hte container
Because I was ssh-ing to it before the fail. When poetry fails, it installs everything using PIP
Yes, the problem is it's still really hidden (the error, I mean)