Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi, I Noted That Clearml-Serving Does Not Support Spacy Models Out Of The Box And That Clearml-Serving Only Supports Following;


Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?

I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the endpoint serving ill auto refresh themselves). There is no actual inference performance per model, but globally it is more efficient.
With DL, obviously all the ML advantages hold, but the main value is the fact we separate the preprocessing to a CPU instance and DL to GPU instance, and this is a huge performance boost. On top, we have the fact that the GPU instance can serve multiple models at the same time (again cost effective). The actual DL model inference boost comes from using Triton as an engine, Nvidia works hard for it to be super optimized in inference and they did a great job with it.

  
  
Posted one year ago
95 Views
0 Answers
one year ago
one year ago