Hi Guys, We Are Running Clearml-Serving On A Kube Cluster On Aws And We Have Noticed That We Are Getting Some 502 Errors Once In A While That We Can'T Seem To Trace Back.

Answered

Hi Guys,
we are running clearml-serving on a kube cluster on AWS and we have noticed that we are getting some 502 errors once in a while that we can't seem to trace back.

We are using siege to test some models deployed and even with very low concurrency (e.g. 2) we will get around 3 failed transactions over 1hour (20, 558 hits). siege -c6 -t2M -H "accept: application/json" -H "Content-Type: application/json" 'https://....
We have an ALB in front of our cluster.
The config we are using for gunicorn is basically 5 workers on a t3a.medium. We have tried many variations around this and could not get rid of these couple 502.
We have tried on much bigger machines too with 8 cores and 16 workers and 5 workers -- same results.
The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.
Note that we set the cycling on the gunicorn workers to 0 so that it is turned off (was one of our suspicions but no luck)
Our models are ridiculously simple as we are averaging 0.4 second per call (11K requests in 1hour with concurrency 1), this to say that with c1 cpu is at 20% max.
We tried both to run on 3 EC2 3 serving instance pods and 1 ec2 1 pod to simplify things still same results

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

Votes Newest

Answers 15

Hi @<1569858449813016576:profile|JumpyRaven4>

The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.Is this an ALB or an ELB ?
What's the timeout its configured?
Do you have GPU instances as well? what's the clearml-serving-inference docker version ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

that's a fair point. Actually we have switched from using siege because we believe it is causing the issues and are using Locust now instead. We have been running for days at the same rate and don't see any errors being reported...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

Hmm reading this: None
How are you checking the health of the serving pod ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

alright, so actually we noticed that the problem disappears if we use only sync requests. Meaning if I create a sleep endpoint that is async we get the 502 but if it's sync we don't

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

ACtually the request are never registered to the gunicorn app, and the ALB log show that there is no response from the target "-".

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

Hi Martin,

Actually we are using ALB with a 30 seconds timeout
we do not have GPUs instances
docker version 1.3.0

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

I'm not sure what to do with that info I must say since the serve_model is async for good reasons I guess

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

yeah I tend to agree... keep me posted hen you find the root cause 🤞

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Meaning if I create a sleep endpoint that is async

Hmm are you calling "sleep" or "async.sleep"?
Also are you running the serving service with GUNICORN or UVCORN?
see here:
None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yeah I don't know I think we are probably just trying to fit to high a throughput for that box but it's weird that the packet just get dropped i would have assumed the response time should degrade and requests be queued.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

time.sleep(time_sleep)

You should not call time.sleep in async functions, it should be asyncio.sleep,
None

See if that makes a difference

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

we have tried both and got the same issue (gunicorn vs uvcorn).
No I meant creating a

@router.post(
    "/sleep",
    tags=["temp"],
    response_description="Return HTTP Status Code 200 (OK)",
    status_code=status.HTTP_200_OK,
    response_model=TestResponse,
)
# def here instead of async def
def post_sleep(time_sleep: float) -> TestResponse:
    """ """
    time.sleep(time_sleep)
    return TestResponse(status="OK")

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

None
The thing is the server will not return a 502 error, only a 500 error,
None
Could it be the k8s ingress maybe ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I have tested with an endpoint that basically add two numbers and never managed to trigger the 502. I'm starting to wonder if we are not running just too many workers. I had it wrong that 2 vcpus should mean 5 workers should be good but I think i should probably be closer to 2 but I m not sure why that would lead requests being dropped

  				
Posted 
	one year ago

					More
				  		
  Report
		
					JumpyRaven4
				
					0
					 × 1

The main question I have is why is the ALB not passing the request, I think you are correct it never reaches the serving server at all, which leads me to think the ALB is "thinking" the service is down or is not responding, wdyt?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

15 Answers

one year ago