Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hello Everyone! I Am Relatively New To Clearml And To The *-Ops Concepts At All, As I Am But A Regular Python Dev. I Am Currently Trying To Implement Mlops Into Our Existing Local Infrastructure, So That We Would Be Able To Utilize Automated Data Preproc

Hello everyone!
I am relatively new to ClearML and to the *-Ops concepts at all, as I am but a regular python dev.

I am currently trying to implement MLOps into our existing local infrastructure, so that we would be able to utilize automated data preprocessing, model training/finetuning, stream inference, concept drift detection and mitigation etc.

However, currently, I am stuck with streaming inference problem:

Our current setup is:

  • CPU server with lots of space and RAM (x.x.x.69) - there we placed the clearml-server dockers on their default ports (8080 http, 8008 api, 8081 fileserver).
  • GPU worker PC (x.x.x.68) with clearml-agent docker for model training/inference.
  • Plans are to expand to 6 more GPU workstations with the same logic.
    GPU worker and CPU server are able to see each other in LAN, they have functioning mutual SSH, ClearML server webUI registers the GPU clearml-agent as valid and running worker, so seems like no issues here.

Now I tried to upload and register the existing .pth file of our model, which was situated on GPU worker (.68), to the clearml server (.69).

ClearML documentation and ChatGPT were telling me to use the clearml-serve package, and that clearml-serve should be installed on the GPU worker, not on the CPU server, to avoid port and logic conflicts.

So I:

  • Installed clearml-serve on .68 , by trial and error somehow registered the model file from GPU worker host, and it appeared in the list of models for my project in ClearML server webUI.
  • Secondly, I launched the streaming inference using Triton-inference docker (because pytorch model and it is necessary to utilize CUDA).
  • After that, I registered the endpoint via the clearml-serve, and it appeared in the description of serving task in webUI.
  • Now the tricky part: I tried to send some input data to test model predictions via CURL POST and via python requests to the endpoint URL I derived from ClearML tutorials: http://x.x.x.69:8080/serve/<my_endpoint_name > And it gave me the HTTP 405 error - Method not allowed.

What could be the issue here? Or rather, what is the easiest and correct way to upload the existing model file and inference it in streaming mode?

Thank you in advance!

If any other info might be necessary or helpful, please, let me know, I shall provide it. :)
image
image
image
image
image
image
image
image

  
  
Posted 9 hours ago
Votes Newest

Answers

6 Views
0 Answers
9 hours ago
5 hours ago
Tags
Similar posts