Hi, I Have A Small Question Regarding K8S Clearml-Serving Behavior. I Have In My Cluster One Gpu Of 16Gb Ram, And Another One Of 24 Gb Ram. I Have A Llm Model Fitting The 24Gb But Not The 16Gb Gpu. When I Call The Endpoint, How Will I Know To Which Gpu I

Answered

Hi,

I have a small question regarding k8s clearml-serving behavior. I have in my cluster one GPU of 16GB RAM, and another one of 24 GB RAM. I have a LLM model fitting the 24GB but not the 16GB GPU. When I call the endpoint, how will I know to which GPU instance the model will be loaded? Do we have parameters to set specific models to specific GPU instances?

Thank you

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Votes Newest

Answers 4

Hi SuccessfulRaven86
Every clearml-serving session (you can have multiple different "sessions") is assumed to be homogeneous, this would mean it will serve the same models on as many nodes as possible supporting multiple models per pod.
In your example I think the easiest is to create two serving sessions one with a node selector for the 24GB node and another for the 16GB node, wdyt?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Correct the serving Task ID is the clearml serving session. It is the instance that holds all the information of this specific setup and models

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hey AgitatedDove14 , thank you for your input
Could you clarify what you mean by clearml-serving session?

Are you refering to the servingTaskId ?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Write your answer

1K Views

4 Answers

one year ago