Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All

Hi All 🙂
I am self hosting my ClearML Server on an EC2 instance on AWS. As far as I understood based on None " By default, the open source ClearML Server runs a single clearml-agent in services mode that listens to the services queue. "

However, when I submit a task to the services queue, the task stays in the Pending state and is never processed.
How can I debug what is happening? Is there any way to check if this default services agent is running?
If the agent is running : how can I find the reason it is not processing tasks in the services queue?
If the agent is not running : is there an explicit way to define in the apiserver.conf file that I wish to have such an agent running in the same machine as the ClearML server?

Thanks in advance!

  
  
Posted 9 months ago
Votes Newest

Answers 10


UPDATE: setting SHUTDOWN_IF_NO_ACCESS_KEY: 1 allowed me to see the agent-services container, and then a docker inspect clearml-agent-services showed me that the environment variables needed for the agent in the docker-compose.yml were empty. So the problem was in my bootstrap script.

Because SHUTDOWN_IF_NO_ACCESS_KEY was set to 0 before, the container would disappear 🙂

Thanks a lot for helping me figure this out!

  
  
Posted 9 months ago

I think you should investigate what happens during docker-compose up to see why the services agent docker isn't running

  
  
Posted 9 months ago

Yes thanks a lot 🙂 This already helped me a lot 😉 I'll investigate!

  
  
Posted 9 months ago

I left the environment variables out to keep things short, but there is one SHUTDOWN_IF_NO_ACCESS_KEY: 1 . Maybe some authentication is failing and the container is stopping.

  
  
Posted 9 months ago

I have this block in my docker compose:

  agent-services:
    networks:
      - backend
    container_name: clearml-agent-services
    image: allegroai/clearml-agent-services:latest
    deploy:
      restart_policy:
        condition: on-failure
    privileged: true
    environment:
      <....>
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/clearml/agent:/root/.clearml
    depends_on:
      - apiserver
    entrypoint: >
      bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused '
' && /usr/agent/entrypoint.sh"
  
  
Posted 9 months ago

Is the services agent part of the docker compose?

  
  
Posted 9 months ago

Hi @<1523702018001080320:profile|StoutElephant16> , it looks like the services agent isn't running for some reason if your task stays in pending. You should check the docker status and logs to see what's going on

  
  
Posted 9 months ago

I suggest starting with docker ps and then checking on the logs of the relevant docker

  
  
Posted 9 months ago

Thanks a lot! Yes, I don't see such a worker in the UI. docker ps returns the containers below. I suppose the clearml-apiserver is the relevant one.
image

  
  
Posted 9 months ago

In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above

  
  
Posted 9 months ago
486 Views
10 Answers
9 months ago
9 months ago
Tags
Similar posts