Unanswered
Hey All,
We Have A Self-Hosted Clearml Server, Today We Launched ~40 Workers To Run Training Jobs On Our Queue, However We Started Getting Errors With Elasticsearch Connection Pool Being Full, And Subsequent Timeouts And Failed Tasks.
Each Training Task
Additionally , I’d like to understand what is being stored in elasticsearch vs mongo, redis etc. from my understanding it is the metrics and console logs being stored in elastic?
I’m thinking the solution may be to reduce the amount of metrics logged by averaging them locally and only reporting them once every 60s or so?
Or is there a way to tune the config of elastic, allowing it to handle the high volume of requests
133 Views
0
Answers
one year ago
one year ago