Another Aws Autoscaler Question. The

Answered

Another AWS autoscaler question. The docker-compose.yml automatically adds a ClearML agent to the services queue.

When I run python aws_autoscaler.py --remote (from my local machine), autoscaler process seems not to be able to find the environment variables set in the docker-compose.yml under agent-services . So the autoscaler process crashes with this error message.

You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
cp: -r not specified; omitting directory '/tmp/clearml.conf'
Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at

 and run `clearml-agent init`

Do I need to do something extra for these env vars set in the service to be available to the setup process of the autoscaler task?

  				
Posted 
	one year ago

					More  		
  Report
		
					BattyCrocodile47
				
					0
					 × 1

Votes Newest

Answers 6

Sorry, clarifying:

The agent-services entry in the docker-compose file seems to add a single worker to the services queue

  				
Posted 
	one year ago

					More  		
  Report
		
					BattyCrocodile47
				
					0
					 × 1

The agent-services is simply an agent running as part of the server deployment, and is not related to the autoscaler

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

At the time that I run python aws_autoscaler.py --remote , that clearml-services worker is the only worker on the services queue. So it will be the worker that picks up the autoscaler task.

But the task seems to be failing on startup due to the CLEARML_API_HOST not being set, but it is set for the docker container that the agent is running on.

Here's the full autoscaler log where the failure happens if that's helpful.

  				
Posted 
	one year ago

					More  		
  Report
		
					BattyCrocodile47
				
					0
					 × 1

That's because you need to set up a clearml.conf file on your machine (where you run the autoscaler)

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Also, if the autoscaler is running from your remote machine, it's basically a client trying to connect to the server, and the server address it uses must be a valid address of the remote server. The agent services container running as part of the docker compare of the server uses the internal docker network (which cannot be accessed outside of the docker compose services)

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi BattyCrocodile47 , I'm not sure I understand - there's no relation between the docker compose for the server, and the autoscaler (which is a script using capabilities on the SDK)

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

1K Views

6 Answers

one year ago