Unanswered
Hello
Periodically Under High Load, We Are Facing Too Long(>1 Sec) Processing Times For Requests Such As: Workers.Status_Report Events.Add_Batch Queues.Get_Next_Task.
Also There Are Warnings "Connection Pool Is Full, Discarding Connection: Elasticsearch-S
As I discovered, this was ES overload due to incorrect ClearML usage: report_scalar was called 100 times per sec(developer reported each sample from wav file). This didn't affect apieserver, because events were batched. Probably there should be some protection against overload on clearml package or apiserver level, as developers could do any crazy stuff in their code 🙃
161 Views
0
Answers
3 years ago
one year ago