UPDATE: setting SHUTDOWN_IF_NO_ACCESS_KEY: 1
allowed me to see the agent-services
container, and then a docker inspect clearml-agent-services
showed me that the environment variables needed for the agent in the docker-compose.yml
were empty. So the problem was in my bootstrap script.
Because SHUTDOWN_IF_NO_ACCESS_KEY
was set to 0
before, the container would disappear 🙂
Thanks a lot for helping me figure this out!
Hi @<1523702018001080320:profile|StoutElephant16> , it looks like the services agent isn't running for some reason if your task stays in pending. You should check the docker status and logs to see what's going on
In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above
I suggest starting with docker ps
and then checking on the logs of the relevant docker
Thanks a lot! Yes, I don't see such a worker in the UI. docker ps
returns the containers below. I suppose the clearml-apiserver
is the relevant one.
Yes thanks a lot 🙂 This already helped me a lot 😉 I'll investigate!
I have this block in my docker compose:
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services:latest
deploy:
restart_policy:
condition: on-failure
privileged: true
environment:
<....>
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/clearml/agent:/root/.clearml
depends_on:
- apiserver
entrypoint: >
bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused '
' && /usr/agent/entrypoint.sh"
Is the services agent part of the docker compose?
I left the environment variables out to keep things short, but there is one SHUTDOWN_IF_NO_ACCESS_KEY: 1
. Maybe some authentication is failing and the container is stopping.
I think you should investigate what happens during docker-compose up to see why the services agent docker isn't running