Unanswered
Hey All,
We Have A Self-Hosted Clearml Server, Today We Launched ~40 Workers To Run Training Jobs On Our Queue, However We Started Getting Errors With Elasticsearch Connection Pool Being Full, And Subsequent Timeouts And Failed Tasks.
Each Training Task
I'd like to understand this as well. I moved my data & model versioning to AWS S3. So, can I get rid of the fileserver? Can I use Cloudwatch to work with logs rather than (what I assume is being done by) Elasticsearch?
149 Views
0
Answers
one year ago
one year ago