Sorry, clarifying:
The agent-services
entry in the docker-compose file seems to add a single worker to the services
queue
Also, if the autoscaler is running from your remote machine, it's basically a client trying to connect to the server, and the server address it uses must be a valid address of the remote server. The agent services container running as part of the docker compare of the server uses the internal docker network (which cannot be accessed outside of the docker compose services)
The agent-services is simply an agent running as part of the server deployment, and is not related to the autoscaler
That's because you need to set up a clearml.conf file on your machine (where you run the autoscaler)
At the time that I run python aws_autoscaler.py --remote
, that clearml-services
worker is the only worker on the services
queue. So it will be the worker that picks up the autoscaler task.
But the task seems to be failing on startup due to the CLEARML_API_HOST
not being set, but it is set for the docker container that the agent is running on.
Here's the full autoscaler log where the failure happens if that's helpful.
Hi @<1541954607595393024:profile|BattyCrocodile47> , I'm not sure I understand - there's no relation between the docker compose for the server, and the autoscaler (which is a script using capabilities on the SDK)