It can be changed with this env var for the apiserver:
CLEARML__hosts__elastic__events__args__timeout=<new number>
Though the better handling could be either increase the elasticsearch capacity (memory and cpu) or decrease the load (send events in smaller batches)
Answered
Hello,
We Are Getting Following Timeout Errors During The Task Run:
Hello,
We are getting following timeout errors during the task run:
2023-08-10 13:53:36,361 - clearml.Metrics - ERROR - Action failed <500/100: events.add_batch/v1.0 (General data error (ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch', port='9200'): Read timed out. (read timeout=60))))>
In API container logs we see:
[2023-08-18 14:58:53,255] [8] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 241121ms, msg=General data error (ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch', port='9200'): Read timed out. (read timeout=60)))
Is there a way to enlarge the timeout? If I understand correct, this is a read timeout from ClearML to Elasticsearch and it should be configured in ClearML client/server.
WebApp: 1.3.0-165 • Server: 1.3.0-165 • API: 2.17
850 Views
1
Answer
one year ago
one year ago
Tags