Hello Channel, I Am Struggling A Lot On An Issue Linked To

Answered

Hello channel,

I am struggling a lot on an issue linked to ClearMl agent and AWS Autoscaler .
This issue is very problematic and urgent, please help me out! We need the autoscaler to work to be able to pull and execute our tasks correctly.

Context:
I want to install a poetry environment from a poetry.lock file located at the root of my repository.
I have a bash script running allowing to install poetry and install python 3.9.16 using Pyenv (needed by my poetry env).
The installation goes on, my repository is cloned, poetry is detected and then poetry installation fails directly without any logs (even when using the verbose mode).

Here is the clearml.conf of my AWS autoscaler config and the logs of one of the latest runs (you can find the bash script inside it) + snippet of the error :

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv alfred in /root/.clearml/venvs-builds/3.9/task_repository/alfred.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.9/task_repository/alfred.git/.venv
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operations: 351 installs, 1 update, 1 removal, 22 skipped
failed installing poetry requirements: Command '['poetry', 'install', '-n', '-v']' returned non-zero exit status 1.
Ignoring pip: markers 'python_version >= "3.10"' don't match your environment

agent {
    vcs_cache.enabled: false

    package_manager: {
          type: poetry,
          poetry_version: "1.4.2",
          poetry_install_extra_args: ["-v"],
     }  
}

sdk {
    development {
         store_code_diff_from_remote: false,
    }

}

Issue and what has been done:

I tested all clearml.co nf parameters without success
I created an AWS instance, ssh-ed into it and try the installation of poetry and it worked...
Last but not least which leads to my full confusion : I setted up SSH access to the AWS autoscaler instances, SSHed to it during runtime, went into the created docker container, waited until the git clone command made effect and poetry installation failed to be able to try myself the poetry install -n -v command. Again, It worked... I am extremely confused because it works on the instance itself created by the autoscaler, but it does not work when running on ClearML. What am I missing??
Thank you a lot for your time,
CC @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14> @<1523701087100473344:profile|SuccessfulKoala55>

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Votes Newest

Answers 40

but I still had time to go inside the container, export the PATH variables for my poetry and python versions, and run the poetry install command there

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Yes indeed, but what about the possibility to do the clone/poetry installation ourself in the init bash script of the task?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Yes I take the export statements from my bash script of the task

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

This is really extremely hard to debug. I am thinking to create another repo and iterate on the packages to hopefully find the problem, but it will take ages.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I tried with dummy repo. Using Python and stripe packages ONLY in the pyproject.toml

Here is my result (still failing) :

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv debug in /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/.venv
Using virtualenv: /root/.clearml/venvs-builds/3.9/task_repository/clearmldebug.git/.venv
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operations: 6 installs, 0 updates, 0 removals
failed installing poetry requirements: Command '['poetry', 'install', '-n', '-v']' returned non-zero exit status 1.
Ignoring pip: markers 'python_version >= "3.10"' don't match your environment

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Yes should be correct. Inside the bash script of the task.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

the autoscaler always uses docker mode

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I am literrally trying with 1 package and python and it fails. I tried with python 3.8 3.9 and 3.9.16. and it always fail --> not linked to python version. What is the problem then? I am wondering if there is not an intrinsic bug

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

I tried too. I do not have more logs inside the ClearML agent 😞

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

So the same?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I literrally connected to it at runtime, and ran poetry install -n and it worked

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Is it a bug inside the AWS autoscaler??

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

I guess it makes no sense because of the steps a clearml-agent works...
I also thought about going to pip mode but not all packages are detected from our poetry.lock file unfortunately so cannot do that.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

where did you specify it?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I am currently trying with a new dummy repo and I iterate over the dependencies of the pyproject.toml.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

@<1556812486840160256:profile|SuccessfulRaven86> , did you install poetry inside the EC2 instance or inside the docker? Basically, where do you put the poetry installation bash script - in the 'init script' section of the autoscaler or on the task's 'setup shell script' in execution tab (This is basically the script that runs inside the docker)

It sounds like you're installing poetry on the ec2 instance itself but the experiment runs inside a docker container

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Because I was ssh-ing to it before the fail. When poetry fails, it installs everything using PIP

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

When the task finally failed, I was kicked of from the container

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

How to make sure that the python version is correct?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Yes same error

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

I think you should try to manually start such a docker container and try to see what fails in the process. Attaching to an existing one has too many differences already

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

How do you explain that it works when I ssh-ed into the same AWS container instance from the autoscaler?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Yes, the problem is it's still really hidden (the error, I mean)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

How is it still up is the task failed?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

The autoscaler just runs it on an AWS instance, inside a docker container - there's no difference from running it yourself inside a docker container - did you try running it inside a docker container as well?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

The same container

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

or just to the AWS instance?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

into the same docker container running the task?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1556812486840160256:profile|SuccessfulRaven86> , to make things easier to debug, can you try running the agent locally?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

@<1523701070390366208:profile|CostlyOstrich36> poetry is installed as part of the bash script of the task.

The init script of the AWS autoscaler only contains three export variables I set.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Show more results

Write your answer

72K Views

40 Answers

one year ago