I left the environment variables out to keep things short, but there is one SHUTDOWN_IF_NO_ACCESS_KEY: 1 . Maybe some authentication is failing and the container is stopping.
Thanks a lot! Yes, I don't see such a worker in the UI. docker ps returns the containers below. I suppose the clearml-apiserver is the relevant one.
Hi @<1523702018001080320:profile|StoutElephant16> , it looks like the services agent isn't running for some reason if your task stays in pending. You should check the docker status and logs to see what's going on
Is the services agent part of the docker compose?
I think you should investigate what happens during docker-compose up to see why the services agent docker isn't running
I have this block in my docker compose:
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services:latest
deploy:
restart_policy:
condition: on-failure
privileged: true
environment:
<....>
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/clearml/agent:/root/.clearml
depends_on:
- apiserver
entrypoint: >
bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused '
' && /usr/agent/entrypoint.sh"
In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above
Yes thanks a lot 🙂 This already helped me a lot 😉 I'll investigate!
UPDATE: setting SHUTDOWN_IF_NO_ACCESS_KEY: 1 allowed me to see the agent-services container, and then a docker inspect clearml-agent-services showed me that the environment variables needed for the agent in the docker-compose.yml were empty. So the problem was in my bootstrap script.
Because SHUTDOWN_IF_NO_ACCESS_KEY was set to 0 before, the container would disappear 🙂
Thanks a lot for helping me figure this out!
I suggest starting with docker ps and then checking on the logs of the relevant docker