Hi, I Have A Small Question Regarding K8S Clearml-Serving Behavior. I Have In My Cluster One Gpu Of 16Gb Ram, And Another One Of 24 Gb Ram. I Have A Llm Model Fitting The 24Gb But Not The 16Gb Gpu. When I Call The Endpoint, How Will I Know To Which Gpu I

Unanswered

The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

295 Views

0 Answers

one year ago