Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Have A Small Question Regarding K8S Clearml-Serving Behavior. I Have In My Cluster One Gpu Of 16Gb Ram, And Another One Of 24 Gb Ram. I Have A Llm Model Fitting The 24Gb But Not The 16Gb Gpu. When I Call The Endpoint, How Will I Know To Which Gpu I

Hi,

I have a small question regarding k8s clearml-serving behavior. I have in my cluster one GPU of 16GB RAM, and another one of 24 GB RAM. I have a LLM model fitting the 24GB but not the 16GB GPU. When I call the endpoint, how will I know to which GPU instance the model will be loaded? Do we have parameters to set specific models to specific GPU instances?

Thank you

  
  
Posted 3 months ago
Votes Newest

Answers 4


Hey @<1523701205467926528:profile|AgitatedDove14> , thank you for your input
Could you clarify what you mean by clearml-serving session?

Are you refering to the servingTaskId ?

  
  
Posted 3 months ago

The servingtaskid is linked to the helm chart, which means that your solution would propose to create multiple kubernetes cluster according to our requirements, no?

  
  
Posted 3 months ago

Correct the serving Task ID is the clearml serving session. It is the instance that holds all the information of this specific setup and models

  
  
Posted 3 months ago

Hi @<1556812486840160256:profile|SuccessfulRaven86>
Every clearml-serving session (you can have multiple different "sessions") is assumed to be homogeneous, this would mean it will serve the same models on as many nodes as possible supporting multiple models per pod.
In your example I think the easiest is to create two serving sessions one with a node selector for the 24GB node and another for the 16GB node, wdyt?

  
  
Posted 3 months ago
213 Views
4 Answers
3 months ago
3 months ago
Tags