Trying To Run Aws Autoscaler With

Answered

Trying to run AWS autoscaler with poetry queue, and I get:
Traceback (most recent call last): File "/root/.local/bin/poetry", line 5, in <module> from poetry.console.application import main ModuleNotFoundError: No module named 'poetry'I know this is not strictly ClearML related, but I wonder if anyone has had any success?
(source CLI is that the agent is trying to run poetry run python -u train.py )

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 30

I think the agent runs the script inside the machine in a docker container, I would assume this is missing from inside the docker container (and not really required in the vm machine itself)

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SuccessfulKoala55 help me out here 🙂
It seems all the changes I make in the AWS autoscaler apply directly to the virtual environment set for the autoscaler, but nothing from that propagates down to the launched instances.
So e.g. the autoscaler environment has poetry installed, but then the instance fails because it does not have it available?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

But to be fair, I've also tried with python3.X -m pip install poetry etc. I get the same error.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Ah. In the extra_vm_bash_script of the AWS autoscaler.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Still crashing, I think that may not be the correct virtual environment to edit 🤔
It's the one created later down the line

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Is there a way to specify that flag within the config file, SuccessfulKoala55 ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

We're not using the docker setup though. The CLI run by the autoscaler is python -m clearml_agent --config-file /root/clearml.conf daemon --queue aws_small , so no docker

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I meant where is that done?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Created this for follow up, SuccessfulKoala55 ; I'm really stumped. Spent the entire day on this 🥹
https://github.com/allegroai/clearml-agent/issues/134

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Let me have a quick look.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I think the default command used to create the venv does not specify --system-site-packages

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I'll try a hacky-way around it with sed -i 's/include-system-site-packages = false/include-system-site-packages = true/g' clearml_agent_venv/pyvenv.cfg and report back.

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I think it's not there since the main goal was supporting docker mode (and it was missed)

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I also tried adding gent.package_manager.system_site_packages = true to ensure these virtual environments have access btw, still no avail

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

SuccessfulKoala55 it does not

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I'll try that in a bit (that requires some access control changes). Any idea how can I modify the dynamically created virtualenv?

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file! The currently activated Python version 3.10.6 is not supported by the project (~3.8.0). Trying to find and use a compatible version. Using python3.8 (3.8.16) Creating virtualenv ... in /root/.clearml/venvs-builds/3.10/task_repository/...git/.venv Installing dependencies from lock file

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

If you ssh into that machine and into the venv, can you see if it inherits the system packages?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It's possible for the agent, but I'm not sure it's supported by the SDK's cloud driver... If it solves your issue, this might be a good addition

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I've tried also e.g. setting gent.package_manager.priority_packages = ["poetry"] , and/or agent.package_manager.poetry_version = ">1.2.0" , and other flags, but these affect only the main /clearml_agent_venv environment, and not the one actually generated by the clearml-agent when executing the task

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Ultimately we're trying to avoid docker in AWS autoscaler (virtualization on top of virtualization seems redundant), and instead we maintain an AMI for a faster boot sequence.
We had no issues when we used pip , but now when trying to work with poetry all these issues came up.
The way I understand poetry to work, is that it is expected there is one system-wide installation that is used for virtual environment creation and manipulation. So at least it may be desired that the poetry installation is inherited from system-wide?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

👍

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Now my extra_vm_bash_script looks like so:
deactivate apt-get install -y gfortran libopenblas-dev liblapack-dev libpq-dev python-is-python3 python3-pip python3-dev proj-bin libgraphviz-dev graphviz graphviz-dev libgdal-dev apt-get install software-properties-common -y add-apt-repository ppa:deadsnakes/ppa -y apt update apt install python3.7 python3.8 python3.9 python3.7-distutils python3.8-distutils python3.9-distutils python3.10-distutils python3.7-dev python3.8-dev python3.9-dev python3.10-dev -y curl -sSL | python3 - export PATH=\"/root/.local/bin:$PATH\" poetry --version sed -i 's/include-system-site-packages = false/include-system-site-packages = true/g' clearml_agent_venv/pyvenv.cfg git config --system credential.helper \"store --file /root/.git-credentials\" python3.7 -m pip install virtualenv python3.8 -m pip install virtualenv python3.9 -m pip install virtualenv python3.10 -m pip install virtualenv export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... source clearml_agent_venv/bin/activate

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Nothing?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

And?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Sure SuccessfulKoala55 , and thanks for looking into it.

As an alternative (for now, or in general), we could consider reverting back to pip. The issue we encounter is that we have a monorepo, so frozen requirements should specify relative paths, but pip freeze does not seem to do that, so ClearML also fails in pip mode

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Thanks for the details, UnevenDolphin73 , and sorry for the inconvenience - we'll try to nail this down...

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

The agent creates a venv in which the script is run, are you sure this venv has access to the python system site packages?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Or to be clear, the environment installed by the autoscaler under /clearml_agent_venv has poetry installed, and it uses that to set up the environment for the executed task, e.g. in root/.clearml/venvs-builds/3.10/task_repository/.../.venv , but the latter does not have poetry installed, and so it crashes?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

SuccessfulKoala55 no that did not solve the issue 😞

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

That still seems to crash SuccessfulKoala55 🤔
EDIT: No, wait, the environment still needs updating. One moment still...

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Write your answer

1K Views

30 Answers

2 years ago