Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Periodically Under High Load, We Are Facing Too Long(>1 Sec) Processing Times For Requests Such As: Workers.Status_Report Events.Add_Batch Queues.Get_Next_Task. Also There Are Warnings "Connection Pool Is Full, Discarding Connection: Elasticsearch-S

Hello
Periodically under high load, we are facing too long(>1 sec) processing times for requests such as: workers.status_report events.add_batch queues.get_next_task.
Also there are warnings "Connection pool is full, discarding connection: elasticsearch-service"
Can you confirm that it's elastic search performance issue ?
Probably you have faced such issues and can recommend something.

  
  
Posted 2 years ago
Votes Newest

Answers 10


Seems the apiserver is out of connections, this is odd...
SuccessfulKoala55 do you have an idea ?

  
  
Posted 2 years ago

AgitatedDove14 are you sure ? Api server has low CPU load( < 10% ). Moreover only requests related to ES are affected, other requests (like tasks.get_all or queues.get_all) are < 10ms

  
  
Posted 2 years ago

Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
<...>
networks:
- backend
- frontend
ports:
- "8008:18008"

apiserver_second:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
<...>
networks:
- backend
- frontend
ports:
- "8008:18009"

ngnix-server:
    image: nginx:1.13
    ports: 
        - "8008:8008"
    volumes: 
        - './ngnix.conf:/etc/ngnix/nginx.conf'
    networks: 
        - backend
    depends_on: 
        - apiserver
        - apiserver_second `Then in the local  ` ngnix.conf `  :

` events { worker_connections 1024;}

http {
upstream api {
server apiserver:18008;
server apiserver_second:18009;
}
server {
listen 8008;
location / {
proxy_pass ;
}
}
} `Notice I might have made a typo above, but generally speaking it should work

  
  
Posted 2 years ago

Well, it.might simply be the elasticsearch driver reusing connections. Regarding the apiserver, the CPU load is not indicative - how many requests per second, approximately?

  
  
Posted 2 years ago

Thanks for the report ItchyJellyfish73 , as far as I know such protections and QoS are supported in supported in the ClearML paid version

  
  
Posted 2 years ago

~30rps

  
  
Posted 2 years ago

As I discovered, this was ES overload due to incorrect ClearML usage: report_scalar was called 100 times per sec(developer reported each sample from wav file). This didn't affect apieserver, because events were batched. Probably there should be some protection against overload on clearml package or apiserver level, as developers could do any crazy stuff in their code 🙃

  
  
Posted 2 years ago

Hmm are you getting the warning on the client side , or in the clearml-server ?

  
  
Posted 2 years ago

🙂

  
  
Posted 2 years ago

It's apiserver logs.

  
  
Posted 2 years ago