Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All

Hi All 🙂
I am self hosting my ClearML Server on an EC2 instance on AWS. As far as I understood based on None " By default, the open source ClearML Server runs a single clearml-agent in services mode that listens to the services queue. "

However, when I submit a task to the services queue, the task stays in the Pending state and is never processed.
How can I debug what is happening? Is there any way to check if this default services agent is running?
If the agent is running : how can I find the reason it is not processing tasks in the services queue?
If the agent is not running : is there an explicit way to define in the apiserver.conf file that I wish to have such an agent running in the same machine as the ClearML server?

Thanks in advance!

  
  
Posted 10 months ago
Votes Newest

Answers 10


UPDATE: setting SHUTDOWN_IF_NO_ACCESS_KEY: 1 allowed me to see the agent-services container, and then a docker inspect clearml-agent-services showed me that the environment variables needed for the agent in the docker-compose.yml were empty. So the problem was in my bootstrap script.

Because SHUTDOWN_IF_NO_ACCESS_KEY was set to 0 before, the container would disappear 🙂

Thanks a lot for helping me figure this out!

  
  
Posted 10 months ago

Is the services agent part of the docker compose?

  
  
Posted 10 months ago

Yes thanks a lot 🙂 This already helped me a lot 😉 I'll investigate!

  
  
Posted 10 months ago

I have this block in my docker compose:

  agent-services:
    networks:
      - backend
    container_name: clearml-agent-services
    image: allegroai/clearml-agent-services:latest
    deploy:
      restart_policy:
        condition: on-failure
    privileged: true
    environment:
      <....>
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/clearml/agent:/root/.clearml
    depends_on:
      - apiserver
    entrypoint: >
      bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused '
' && /usr/agent/entrypoint.sh"
  
  
Posted 10 months ago

I think you should investigate what happens during docker-compose up to see why the services agent docker isn't running

  
  
Posted 10 months ago

Thanks a lot! Yes, I don't see such a worker in the UI. docker ps returns the containers below. I suppose the clearml-apiserver is the relevant one.
image

  
  
Posted 10 months ago

Hi @<1523702018001080320:profile|StoutElephant16> , it looks like the services agent isn't running for some reason if your task stays in pending. You should check the docker status and logs to see what's going on

  
  
Posted 10 months ago

In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above

  
  
Posted 10 months ago

I suggest starting with docker ps and then checking on the logs of the relevant docker

  
  
Posted 10 months ago

I left the environment variables out to keep things short, but there is one SHUTDOWN_IF_NO_ACCESS_KEY: 1 . Maybe some authentication is failing and the container is stopping.

  
  
Posted 10 months ago
503 Views
10 Answers
10 months ago
10 months ago
Tags
Similar posts