Hey All, We Have A Self-Hosted Clearml Server, Today We Launched ~40 Workers To Run Training Jobs On Our Queue, However We Started Getting Errors With Elasticsearch Connection Pool Being Full, And Subsequent Timeouts And Failed Tasks. Each Training Task

Unanswered

Additionally , I’d like to understand what is being stored in elasticsearch vs mongo, redis etc. from my understanding it is the metrics and console logs being stored in elastic?

I’m thinking the solution may be to reduce the amount of metrics logged by averaging them locally and only reporting them once every 60s or so?

Or is there a way to tune the config of elastic, allowing it to handle the high volume of requests

  				
Posted 
	one year ago

					More  		
  Report
		
					StaleLeopard22
				
					0
					 × 1

183 Views

0 Answers

one year ago