Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Sporadic Failure To Retrieve Scalars And Console Logs. Context: Self-Hosted In Azure With 2 Separate Azure Container App For The Ui And Api Server. Elasticsearch, Mongodb As Azure Service Subscription. Symptom: For Long Running Task, We Sometime Get Err

Sporadic failure to retrieve Scalars and Console logs.

Context: self-hosted in Azure with 2 separate Azure Container App for the UI and API server.
ElasticSearch, MongoDB as Azure service subscription.

Symptom: for long running task, we sometime get error failing to fetch Scalars and/or Console log in the WebUI. With enough "refreshing the page", the Scalars/Console log are retrieved and display as normal. The issue happen more often with big task (eg 12k iterations)

We managed to reproduce the issue with curl API call, so we don't think it's a problem related to the WebUI:

curl -v -X POST 
 -d '{"task":"4c9224c6ec82425bbd66256de45c0e23","key":"iter"}' --output clearml_out.gz -H 'Accept: application/json' \
-H 'Accept-Encoding: gzip, deflate, br, zstd' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'Content-Length: 56' \
-H 'Content-Type: application/json' \
-H 'Cookie: _ga=GA1.2.9817188[REDACTED]' \
-H 'Host: REDACTED.azurecontainerapps.io' \
-H 'Sec-Fetch-Dest: empty' \
-H 'Sec-Fetch-Mode: cors' \
-H 'Sec-Fetch-Site: same-origin' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36 Edg/133.0.0.0' \
-H 'X-Allegro-Client: Webapp-2.0.0-613' \
-H 'sec-ch-ua: "Not(A:Brand";v="99", "Microsoft Edge";v="133", "Chromium";v="133"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"'

which yield this error:

} [56 bytes data]
100    56    0     0  100    56      0     46  0:00:01  0:00:01 --:--:--    46< HTTP/1.1 200 OK
< server: nginx/1.22.1
< date: Sun, 16 Feb 2025 19:04:39 GMT
< content-type: application/json
< content-length: 79288
< vary: Accept-Encoding
< content-encoding: zstd
<
{ [15180 bytes data]
* transfer closed with 13947 bytes remaining to read
 82 79344   82 65341  100    56  35765     30  0:00:02  0:00:01  0:00:01 35814
* Closing connection
} [5 bytes data]
* TLSv1.3 (OUT), TLS alert, close notify (256):
} [2 bytes data]
curl: (18) transfer closed with 13947 bytes remaining to read

We did not find any relevant message in ES log:

Do you have any tip/hint how we can diagnose this issue further ? SuccessfulKoala55 CostlyOstrich36 Thanks in advance 😉
image

  
  
Posted one month ago
Votes Newest

Answers


Hi ManiacalLizard2 , it feels like something related to the resources of the server or networking and it's having a hard time retrieving the data from ES. What resources have you allocated for the API server/ ES?

  
  
Posted one month ago
182 Views
1 Answer
one month ago
one month ago
Tags