Hi, We Are Currently Seeing The Following Error In Our Logs Of The Clearml Apiserver Pod:

Answered

Hi, we are currently seeing the following error in our logs of the ClearML apiserver pod:

[2024-03-20 15:33:32,089] [8] [WARNING] [elasticsearch] POST None [status:429 request:0.001s]
[2024-03-20 15:33:32,089] [8] [ERROR] [clearml.__init__] Failed processing worker status report
Traceback (most recent call last):
File "/opt/clearml/apiserver/bll/workers/__init__.py", line 153, in status_report
self.log_stats_to_es(
File "/opt/clearml/apiserver/bll/workers/__init__.py", line 557, in log_stats_to_es
es_res = elasticsearch.helpers.bulk( self.es _client, actions)
File "/usr/local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 410, in bulk
for ok, item in streaming_bulk(
File "/usr/local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 329, in streaming_bulk
for data, (ok, info) in zip(
File "/usr/local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 256, in _process_bulk_chunk
for item in gen:
File "/usr/local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 195, in _process_bulk_chunk_error
raise error
File "/usr/local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 240, in _process_bulk_chunk
resp = client.bulk(*args, body="\n".join(bulk_actions) + "\n", **kwargs)
File "/usr/local/lib/python3.9/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/usr/local/lib/python3.9/site-packages/elasticsearch/client/__init__.py", line 472, in bulk
return self.transport.perform_request(
File "/usr/local/lib/python3.9/site-packages/elasticsearch/transport.py", line 466, in perform_request
raise e
File "/usr/local/lib/python3.9/site-packages/elasticsearch/transport.py", line 427, in perform_request
status, headers_response, data = connection.perform_request(
File "/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py", line 291, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python3.9/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.TransportError: TransportError(429, 'circuit_breaking_exception', '[parent] Data too large, data for [<http_request>] would be [1057463944/1008.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1057460904/1008.4mb], new bytes reserved: [3040/2.9kb], usages [inflight_requests=3040/2.9kb, request=0/0b, fielddata=9261/9kb, eql_sequence=0/0b, model_inference=0/0b]')
[2024-03-20 15:33:32,090] [8] [ERROR] [clearml.service_repo] Returned 500 for workers.status_report in 5ms, msg=General data error (Failed processing worker status report): err=429

I am not sure what to read out of this message: Is ClearML attempting to do a http request with nearly a GB of data?
I suspect that it has to something with an agent machine we recently added as a worker to the ClearML server but I do not understand where the big amount of data should come from as we have no tasks in the queue and only had one task in the queue (which was processed successfully) with around 1 MB of data.

  				
Posted 
	12 months ago

					More  		
  Report
		
					RattySparrow90
				
					0
					 × 1

Votes Newest

Answers 4

By the way, is there the possibility to decrease the log level of the api server and the file server? In the ClearML serving deployment a uvicorn log level environment variable can be set. Is there something similar available for the ClearML api and file server? I searched a little bit in the code and did not really find a place where the log level is defined

  				
Posted 
	12 months ago

					More  		
  Report
		
					RattySparrow90
				
					0
					 × 1

SuccessfulKoala55 Only one

  				
Posted 
	12 months ago

					More  		
  Report
		
					RattySparrow90
				
					0
					 × 1

RattySparrow90 how many workers do you have reporting in your system?

  				
Posted 
	12 months ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SucculentMole19 FYI

  				
Posted 
	12 months ago

					More  		
  Report
		
					RattySparrow90
				
					0
					 × 1

Write your answer

809 Views

4 Answers

12 months ago