Ah. In the extra_vm_bash_script
of the AWS autoscaler.
We're not using the docker setup though. The CLI run by the autoscaler is python -m clearml_agent --config-file /root/clearml.conf daemon --queue aws_small
, so no docker
Is there a way to specify that flag within the config file, SuccessfulKoala55 ?
Sure SuccessfulKoala55 , and thanks for looking into it.
As an alternative (for now, or in general), we could consider reverting back to pip. The issue we encounter is that we have a monorepo, so frozen requirements should specify relative paths, but pip freeze
does not seem to do that, so ClearML also fails in pip
mode
I also tried adding gent.package_manager.system_site_packages = true
to ensure these virtual environments have access btw, still no avail
SuccessfulKoala55 no that did not solve the issue 😞
Thanks for the details, UnevenDolphin73 , and sorry for the inconvenience - we'll try to nail this down...
That still seems to crash SuccessfulKoala55 🤔
EDIT: No, wait, the environment still needs updating. One moment still...
Ultimately we're trying to avoid docker in AWS autoscaler (virtualization on top of virtualization seems redundant), and instead we maintain an AMI for a faster boot sequence.
We had no issues when we used pip
, but now when trying to work with poetry
all these issues came up.
The way I understand poetry
to work, is that it is expected there is one system-wide installation that is used for virtual environment creation and manipulation. So at least it may be desired that the poetry
installation is inherited from system-wide?
Created this for follow up, SuccessfulKoala55 ; I'm really stumped. Spent the entire day on this 🥹
https://github.com/allegroai/clearml-agent/issues/134
It's possible for the agent, but I'm not sure it's supported by the SDK's cloud driver... If it solves your issue, this might be a good addition
The agent creates a venv in which the script is run, are you sure this venv has access to the python system site packages?
I think it's not there since the main goal was supporting docker mode (and it was missed)
But to be fair, I've also tried with python3.X -m pip install poetry
etc. I get the same error.
Still crashing, I think that may not be the correct virtual environment to edit 🤔
It's the one created later down the line
I'll try that in a bit (that requires some access control changes). Any idea how can I modify the dynamically created virtualenv?
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file! The currently activated Python version 3.10.6 is not supported by the project (~3.8.0). Trying to find and use a compatible version. Using python3.8 (3.8.16) Creating virtualenv ... in /root/.clearml/venvs-builds/3.10/task_repository/...git/.venv Installing dependencies from lock file
Or to be clear, the environment installed by the autoscaler under /clearml_agent_venv
has poetry installed, and it uses that to set up the environment for the executed task, e.g. in root/.clearml/venvs-builds/3.10/task_repository/.../.venv
, but the latter does not have poetry installed, and so it crashes?
SuccessfulKoala55 help me out here 🙂
It seems all the changes I make in the AWS autoscaler apply directly to the virtual environment set for the autoscaler, but nothing from that propagates down to the launched instances.
So e.g. the autoscaler environment has poetry
installed, but then the instance fails because it does not have it available?
If you ssh into that machine and into the venv, can you see if it inherits the system packages?
Now my extra_vm_bash_script
looks like so:deactivate apt-get install -y gfortran libopenblas-dev liblapack-dev libpq-dev python-is-python3 python3-pip python3-dev proj-bin libgraphviz-dev graphviz graphviz-dev libgdal-dev apt-get install software-properties-common -y add-apt-repository ppa:deadsnakes/ppa -y apt update apt install python3.7 python3.8 python3.9 python3.7-distutils python3.8-distutils python3.9-distutils python3.10-distutils python3.7-dev python3.8-dev python3.9-dev python3.10-dev -y curl -sSL
| python3 - export PATH=\"/root/.local/bin:$PATH\" poetry --version sed -i 's/include-system-site-packages = false/include-system-site-packages = true/g' clearml_agent_venv/pyvenv.cfg git config --system credential.helper \"store --file /root/.git-credentials\" python3.7 -m pip install virtualenv python3.8 -m pip install virtualenv python3.9 -m pip install virtualenv python3.10 -m pip install virtualenv export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... source clearml_agent_venv/bin/activate
I think the default command used to create the venv does not specify --system-site-packages
I'll try a hacky-way around it with sed -i 's/include-system-site-packages = false/include-system-site-packages = true/g' clearml_agent_venv/pyvenv.cfg
and report back.
I think the agent runs the script inside the machine in a docker container, I would assume this is missing from inside the docker container (and not really required in the vm machine itself)
I've tried also e.g. setting gent.package_manager.priority_packages = ["poetry"]
, and/or agent.package_manager.poetry_version = ">1.2.0"
, and other flags, but these affect only the main /clearml_agent_venv
environment, and not the one actually generated by the clearml-agent
when executing the task