Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone! I Am Relatively New To Clearml And To The *-Ops Concepts At All, As I Am But A Regular Python Dev. I Am Currently Trying To Implement Mlops Into Our Existing Local Infrastructure, So That We Would Be Able To Utilize Automated Data Preproc

Hello everyone!
I am relatively new to ClearML and to the *-Ops concepts at all, as I am but a regular python dev.

I am currently trying to implement MLOps into our existing local infrastructure, so that we would be able to utilize automated data preprocessing, model training/finetuning, stream inference, concept drift detection and mitigation etc.

However, currently, I am stuck with streaming inference problem:

Our current setup is:

  • CPU server with lots of space and RAM (x.x.x.69) - there we placed the clearml-server dockers on their default ports (8080 http, 8008 api, 8081 fileserver).
  • GPU worker PC (x.x.x.68) with clearml-agent docker for model training/inference.
  • Plans are to expand to 6 more GPU workstations with the same logic.
    GPU worker and CPU server are able to see each other in LAN, they have functioning mutual SSH, ClearML server webUI registers the GPU clearml-agent as valid and running worker, so seems like no issues here.

Now I tried to upload and register the existing .pth file of our model, which was situated on GPU worker (.68), to the clearml server (.69).

ClearML documentation and ChatGPT were telling me to use the clearml-serve package, and that clearml-serve should be installed on the GPU worker, not on the CPU server, to avoid port and logic conflicts.

So I:

  • Installed clearml-serve on .68 , by trial and error somehow registered the model file from GPU worker host, and it appeared in the list of models for my project in ClearML server webUI.
  • Secondly, I launched the streaming inference using Triton-inference docker (because pytorch model and it is necessary to utilize CUDA).
  • After that, I registered the endpoint via the clearml-serve, and it appeared in the description of serving task in webUI.
  • Now the tricky part: I tried to send some input data to test model predictions via CURL POST and via python requests to the endpoint URL I derived from ClearML tutorials: http://x.x.x.69:8080/serve/<my_endpoint_name > And it gave me the HTTP 405 error - Method not allowed.

What could be the issue here? Or rather, what is the easiest and correct way to upload the existing model file and inference it in streaming mode?

Thank you in advance!

If any other info might be necessary or helpful, please, let me know, I shall provide it. :)
image
image
image
image
image
image
image
image

  
  
Posted one month ago
Votes Newest

Answers 4


@<1523701087100473344:profile|SuccessfulKoala55> Thank you once again, I extracted the scripts and commands, that seemingly were responsible for model registration and its inference on GPU worker server:

register_model.py

from clearml import Task, OutputModel

task = Task.init(project_name="LogSentinel", task_name="Model Registration")
model_path = "~/<full_local_path_to_model>/deeplog_bestloss.pth"

# Register the model
output_model = OutputModel(task=task)
output_model.update_weights(model_path)
output_model.publish()
print(f"Model ID: {output_model.id}")

Commands:

docker compose --env-file .env -f docker-compose-triton-gpu.yml up -d

clearml-serving create --project "LogSentinel" --name "deeplog-serving"

clearml-serving model add   --engine triton   --endpoint "deeplog"   --model-id 0c6a1c24067a49a0ac09c7e42c215b05   --input-name "log_sequence" --input-type "int64" --input-size 1 10   --output-name "predictions" --output-type "float32" --output-size 1 28
  
  
Posted one month ago

Here's the simplified diagram of the architecture:
image

  
  
Posted one month ago

Hi @<1773158043551272960:profile|PungentRobin32> ,
I'm a bit confused, do you mean clearml-serving ? How did you install it?

  
  
Posted one month ago

Hi @<1523701087100473344:profile|SuccessfulKoala55> , thank you for the reply!

Yes, I am talking about clearml-serving.

I will be near my pc in nearest couple of hours and will send the list of commands as well as a visual scheme of an architecture. :)

  
  
Posted one month ago
144 Views
4 Answers
one month ago
one month ago
Tags
Similar posts