Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Noted That Clearml-Serving Does Not Support Spacy Models Out Of The Box And That Clearml-Serving Only Supports Following;

Hi, i noted that clearml-serving does not support Spacy models out of the box and that Clearml-Serving only supports following;
Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM) Support Deep Learning Models (Tensorflow, PyTorch, ONNX)
I believe in such scenarios, a custom engine of sorts would be required. I would like to know, how to create a custom engine for clearml-serving? For example, in this case Spacy for its many use cases...?

  
  
Posted 2 years ago
Votes Newest

Answers 15


Hi PerplexedCow66
I'm assuming an extension for this:
https://github.com/allegroai/clearml-serving/issues/32

Basically JWT can be used as a general access/block all endpoints, which is most efficnely used if handled by k8s loadbalancer (nginx/envoy),
but if you want a per-endpoint check (or maybe do something based on the JWT values)
See adding JWT to FastAPI here:
https://fastapi.tiangolo.com/tutorial/security/oauth2-jwt/?h=jwt#oauth2-with-password-and-hashing-bearer-with-jwt-tokens
Then you can basically quickly change this section to add the JWT (or use parts of the decoded JWT dict for routing)
https://github.com/allegroai/clearml-serving/blob/826f503cf4a9b069b89eb053696d218d1ce26f47/clearml_serving/serving/main.py#L90

wdty?

  
  
Posted 2 years ago

  1. Correct. Basically the order is restapi body dictionary-> preprocess -> process -> post-process -> restapi dictionary return
  
  
Posted 2 years ago

Agreed! I was trying to avoid this, because I wanted that each tenant acess directly the serving endpoint, to maximize performance. But I guess I will loose just a few ms separating auth layer and execution layer.

Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?

For example, if I decide to use clearml-serving --engine custom , what would be the drawbacks?

  
  
Posted 2 years ago

AgitatedDove14 The pipeline example is a bit unclear, I would like to know how to serve a model, even if I do not use any serving engine. Can you please formulate a more complete example on that? Besides that, how can I manage authorization between multiple endpoints? How can I manage API keys to control who can access my endpoints?

Many thanks!

  
  
Posted 2 years ago

1- Thanks, I need to experiment on that code. I guess I should use the AI model in the "process" function, correct?

2- Regarding the RestAPI authentication, I have a multi-tenant application, so I need different auth keys for each endpoint. Any ideas?

3- I admit I don't know the best practices for authorizing users with serving engines. If I am creating my RestAPI, that is easy and flexibel. But how people solve this for TorcheServe, Triton etc? Using load balancer auth as you suggested?

  
  
Posted 2 years ago

AgitatedDove14 is it possible that every endpoint controls its own JWT token in clearml-serving? In my multi-tenant application, the tenants can only access their own endpoints, that is why I need to authenticate them before using it.

Regarding the load-balancer authorizing users with JWT, do you have links/resources where I can take a deeper look? Thanks

  
  
Posted 2 years ago

Hi SubstantialElk6

noted that clearml-serving does not support Spacy models out of the box and

So this is a good point.

To add any pissing package to the preprocessing docker you can just add them in the following environment variable here: https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/docker/docker-compose.yml#L83
EXTRA_PYTHON_PACKAGES="spacy>1"
Regrading a custom engine, basically this is supported with --engine custom
you can see the model pipeline example uses that same workflow:
https://github.com/allegroai/clearml-serving/blob/main/examples/pipeline/preprocess.py
Instructions on setup here:
https://github.com/allegroai/clearml-serving/tree/main/examples/pipeline

We should add a custom example (any chance you can add a git issue, so we do not forget)
specifically to spacy, how could we separate the pre/post processing with the actual model inference easily? I would love to add it as a custom engine or example

  
  
Posted 2 years ago

Hello AgitatedDove14 , i will summarize the responses:

1- Exactly, custom code
2 and 3 - I want to manage access control over the RestAPI

  
  
Posted 2 years ago

These are maybe good features to include in ClearML:

or

.

Sure, we should probably add a section into the doc explaining how to do that

Other approach is creating my own API on the top of clearml-serving endpoints and there I control each tenant authentication.

I have to admit that to me this is a much better solution (then my/bento integrated JWT option). Generally speaking I think this is the best approach, it separates authentication layer from execution layer (i.e. JWT parsing / auth in your code vs runnign models in clearml-serving), allows maximum flexibility and does not require to change any 3rd party (i.e. clearml) code.
wdyt?

  
  
Posted 2 years ago

Hi PerplexedCow66

I would like to know how to serve a model, even if I do not use any serving engine

What do you mean no serving engine, i.e. custom code?

Besides that, how can I manage authorization between multiple endpoints?

Are you referring to limiting access to all the endpoints?

How can I manage API keys to control who can access my endpoints?

Just to be clear, accessing the endpoints has nothing to do with the clearml-server credentials, so are you asking how to add access control over RestAPI ?

  
  
Posted 2 years ago

yes, every tenant has their own serving endpoint

  
  
Posted 2 years ago

2 and 3 - I want to manage access control over the RestAPI

Long story short, put a load-balancer in front of the entire thing (see the k8s setup), and have the load-balancer verify JWT token as authentication (this is usually the easiest)

1- Exactly, custom code

Yes, we need to add a custom example there (somehow forgotten)
Could you open an Issue for that?
in the meantime:
` #

Preprocess class Must be named "Preprocess"

No need to inherit or to implement all methods

lass Preprocess(object):
"""
Preprocess class Must be named "Preprocess"
Otherwise there are No limitations, No need to inherit or to implement all methods
Notice! This is not thread safe! the same instance may be accessed from multiple threads simultaneously
"""

def __init__(self):
    # set internal state, this will be called only once. (i.e. not per request)
    # it will also set the internal model_endpoint to reference the specific model endpoint object being served
    self.model_endpoint = None  # type: clearml_serving.serving.endpoints.ModelEndpoint
    self._model = None

def load(self, local_file_name: str) -> Optional[Any]:  # noqa
    """
    Optional: provide loading method for the model
    useful if we need to load a model in a specific way for the prediction engine to work
    :param local_file_name: file name / path to read load the model from
    :return: Object is stored on self._model
    """
    pass

def preprocess(
        self,
        body: dict,
        state: dict,
        collect_custom_statistics_fn: Optional[Callable[[dict], None]],
) -> Any:  # noqa
    """
    Optional: do something with the request data, return any type of object.
    The returned object will be passed as is to the inference engine
    :param body: dictionary as recieved from the RestAPI
    :param state: Use state dict to store data passed to the post-processing function call.
        This is a per-request state dict (meaning a new empty dict will be passed per request)
        Usage example:
        >>> def preprocess(..., state):
                state['preprocess_aux_data'] = [1,2,3]
        >>> def postprocess(..., state):
                print(state['preprocess_aux_data'])
    :param collect_custom_statistics_fn: Optional, if provided allows to send a custom set of key/values
        to the statictics collector servicd.
        None is passed if statiscs collector is not configured, or if the current request should not be collected
        Usage example:
        >>> print(body)
        {"x0": 1, "x1": 2}
        >>> if collect_custom_statistics_fn:
        >>>   collect_custom_statistics_fn({"x0": 1, "x1": 2})
    :return: Object to be passed directly to the model inference
    """
    return body

def postprocess(
        self,
        data: Any,
        state: dict,
        collect_custom_statistics_fn: Optional[Callable[[dict], None]],
) -> dict:  # noqa
    """
    Optional: post process the data returned from the model inference engine
    returned dict will be passed back as the request result as is.
    :param data: object as recieved from the inference model function
    :param state: Use state dict to store data passed to the post-processing function call.
        This is a per-request state dict (meaning a dict instance per request)
        Usage example:
        >>> def preprocess(..., state):
                state['preprocess_aux_data'] = [1,2,3]
        >>> def postprocess(..., state):
                print(state['preprocess_aux_data'])
    :param collect_custom_statistics_fn: Optional, if provided allows to send a custom set of key/values
        to the statictics collector servicd.
        None is passed if statiscs collector is not configured, or if the current request should not be collected
        Usage example:
        >>> if collect_custom_statistics_fn:
        >>>   collect_custom_statistics_fn({"y": 1})
    :return: Dictionary passed directly as the returned result of the RestAPI
    """
    return data

def process(
        self,
        data: Any,
        state: dict,
        collect_custom_statistics_fn: Optional[Callable[[dict], None]],
) -> Any:  # noqa
    """
    Optional: do something with the actual data, return any type of object.
    The returned object will be passed as is to the postprocess function engine
    :param data: object as recieved from the preprocessing function
    :param state: Use state dict to store data passed to the post-processing function call.
        This is a per-request state dict (meaning a dict instance per request)
        Usage example:
        >>> def preprocess(..., state):
                state['preprocess_aux_data'] = [1,2,3]
        >>> def postprocess(..., state):
                print(state['preprocess_aux_data'])
    :param collect_custom_statistics_fn: Optional, if provided allows to send a custom set of key/values
        to the statictics collector servicd.
        None is passed if statiscs collector is not configured, or if the current request should not be collected
        Usage example:
        >>> if collect_custom_statistics_fn:
        >>>   collect_custom_statistics_fn({"type": "classification"})
    :return: Object to be passed tp the post-processing function
    """
    return data `

Does that help?

  
  
Posted 2 years ago

Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?

I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the endpoint serving ill auto refresh themselves). There is no actual inference performance per model, but globally it is more efficient.
With DL, obviously all the ML advantages hold, but the main value is the fact we separate the preprocessing to a CPU instance and DL to GPU instance, and this is a huge performance boost. On top, we have the fact that the GPU instance can serve multiple models at the same time (again cost effective). The actual DL model inference boost comes from using Triton as an engine, Nvidia works hard for it to be super optimized in inference and they did a great job with it.

  
  
Posted 2 years ago

2,3 ) the question is whether the serving is changing from one tenant to another, does it?

  
  
Posted 2 years ago

Thanks, these are good tips. I will think on that. Other approach is creating my own API on the top of clearml-serving endpoints and there I control each tenant authentication.

These are maybe good features to include in ClearML: https://docs.bentoml.org/en/latest/guides/securing_endpoints.html or https://docs.bentoml.org/en/latest/guides/server.html .

Thank you for your attention and the short discussion!

  
  
Posted 2 years ago
1K Views
15 Answers
2 years ago
one year ago
Tags