Reputation
Badges 1
17 × Eureka!instead of transferring the entire image
This can be any type of preprocessing data(image, audio, bytearray)
so basically check the hash and say, no need to upload?
Thanks for answering, Yes, this is exactly what I wanted
@<1523701205467926528:profile|AgitatedDove14> Actually our meant is something like the following example from the triton client examples:
Does clearml has any example for using shared memory? or it's out of context for clearml?
@<1523701205467926528:profile|AgitatedDove14> Thanks
Thanks for sharing that but If I'm not mistaken, I couldn't share my exact issue here. Shared memory will also utilize the same communications as HTTP/RPC. However, instead of transferring the entire image, for example, to the Triton server, it will bind the image's address to some shared memory and then send the address using HTTP to the Triton server. By doing this, we can save the cost of transferring data. Please correct me if I'm wrong about this. I want to know if clearml can support su...
Thanks @<1523701435869433856:profile|SmugDolphin23>
@<1523701205467926528:profile|AgitatedDove14> Thanks, suppose that I have a few serving instances that I can see the list using clearml-serving list
one of them has been named incorrectly and now I'm trying to remove it and it's not running anywhere, so is there any way to remove it?
Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for the answer. I just wanted to know more about the broader plot of it. I'm more of an ML engineer, so for a self-hosted server, I wanted to know what the best way to create and register the SSL keys is. I think this might be out of context or a noob question, so I apologize for it.
sorry just a q question, so we do not need to do much in our end right? I mean clearml will handle sharing memory between the process.py and triton server?
Thanks, @<1523701205467926528:profile|AgitatedDove14> , for your feedback. Actually, I've been working with TRT-LLM since day zero of its launch. It is very good for LLMs, However, I haven't had the chance to check the trtllm-backend, as I'm waiting for some features there. However, I'm planning to use it and examine it. I will try to provide any feedback I have on that. But before doing so, I need to become more familiar with the internals of ClearML, I guess.
By the way, thanks for the fe...
@<1523701205467926528:profile|AgitatedDove14> No, actually I can upload a directory for the model thanks to the ClearML, but what I really want to achieve is to share this code:
├── common
│ ├── common.py
Between these two preprocess.py
:
└── yolo8
└── preprocess.py
└── yolo7
└── preprocess.py
Hi @<1523701205467926528:profile|AgitatedDove14> , Thanks for answering, It's not what I meant. Suppose that I have three models and these models can't be loaded simultaneously on GPU memory( since there is not enough GPU ram for all of them at the same time). What I have in mind is this: is there an automatic way to unload a model (for example, if a model hasn't been run in the last 10 minutes, or something similar)? Or, if we don't have such an automatic method, can we manually unload the ...
@<1523701205467926528:profile|AgitatedDove14> Thanks for the response, Yeah each endpoint will have it's own modules/files, just wanted to know if there is a way to share such common code between different endpoints in a way that the common code can be get synced like the preprocessing code.
Just I do have one question, please suppose that we have 1000 vm instances that are running, and please suppose that I will create a package from the common code and install it alongside of the containe...
@<1523701205467926528:profile|AgitatedDove14> Thanks for the prompt response
@<1523701205467926528:profile|AgitatedDove14> About the proposed ways for fixing this issue, I've got my hands a little dirty with the code, and I think maybe adding another option to include some other files in the clearml-serving model add
command would be beneficial here. Please suppose that I have the current directory for now:
├── common
│ ├── common.py
└── yolo8
├── 1
│ ├── model_NVIDIA_GeForce_RTX_3080.plan
│ └── model_Tesla_T4.plan
├── config.pbtxt
...
@<1523701205467926528:profile|AgitatedDove14> That is awesome. Could you please provide me with the branch that you are working on or specific commit that can help me know how you are implementing it? Honestly, I want to get familiar with it and, if possible, contribute to the project.
@<1523701205467926528:profile|AgitatedDove14> No, I didn't do that, but if I'm not mistaken, about a month ago I saw some users on Reddit comparing it. They observed that TRT-LLM outperforms all kinds of leading backends, including VLLM. I will try to find it and paste it here.