Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, Community. I Hope You Are All Doing Well. I'M Seeking Information Regarding A Specific Problem, Specially In The Field Of Computer Vision. Typically, An App In The Field Of Computer Vision Will Have Multiple Models, Each With Its Own Preprocessing,

Hello, community. I hope you are all doing well. I'm seeking information regarding a specific problem, specially in the field of computer vision. Typically, an app in the field of computer vision will have multiple models, each with its own preprocessing, inference, and post-processing steps. Given the similarities in the preprocessing step across different computer vision models, I'm curious about how we can prevent boilerplate codes in the preprocessing step. Consider the following structure for illustration:

├── models
   └── yolov8
     ├── config.pbtxt
     └── preprocess.py
  └── yolov7
     ├── config.pbtxt
     └── preprocess.py

In the preprocess.py files, we will have so many similar lines which is not good. What are the best practices for sharing similar functions across these files? Is there a way to utilize artifacts or another method to achieve this?

  
  
Posted 7 months ago
Votes Newest

Answers 8


@<1523701205467926528:profile|AgitatedDove14> Thanks for the prompt response

  
  
Posted 7 months ago

@<1523701205467926528:profile|AgitatedDove14> Thanks for the response, Yeah each endpoint will have it's own modules/files, just wanted to know if there is a way to share such common code between different endpoints in a way that the common code can be get synced like the preprocessing code.

Just I do have one question, please suppose that we have 1000 vm instances that are running, and please suppose that I will create a package from the common code and install it alongside of the container, what is the best approach to update the package if we have frequent update on this common code?

  
  
Posted 7 months ago

what is the best approach to update the package if we have frequent update on this common code?

since this package has an indirect affect on the model endpoint, I would package with the preprocess code of the endpoint.
Each server is updating it's own local copy, and it will make sure it can take it and deploy it hand over hand without breaking its ability to serve these endpoints.
the "wastefulness" of holding multiple copies is negligible when comparing to a situation where everyone is sharing the same exact copy, and upgrading results in everyone freezing their ability to serve

  
  
Posted 7 months ago

@<1523701205467926528:profile|AgitatedDove14> No, actually I can upload a directory for the model thanks to the ClearML, but what I really want to achieve is to share this code:

├── common
│   ├── common.py

Between these two preprocess.py :

└── yolo8
    └── preprocess.py
└── yolo7
    └── preprocess.py
  
  
Posted 7 months ago

Hi @<1657918706052763648:profile|SillyRobin38>

In the

preprocess.py

files, we will have so many similar lines which is not good.

Actually the clearml-serving supports also directories, i.e. you can package an entire module as part of the preprocess, which would be easier for your code
Another option is to package your code in a python package and have that installed on the container (there is a special env var that allows you to add those to the serving container)
None

  
  
Posted 7 months ago

, but what I really want to achieve is to share this code:

You mean to share the code between them, unless this is a "preinstalled" package in the container, each endpoint has it's own separate set of modules / files
(this is on purpose, so you could actually change them, just image diff versions of the same common.py file)

  
  
Posted 7 months ago

You mean to add these two to the model when deploying?

    │   ├── model_NVIDIA_GeForce_RTX_3080.plan
    │   └── model_Tesla_T4.plan

Notice the preprocess.py is Not running on the GPU instance, it is running on a CPU instance (technically not the same machine)

  
  
Posted 7 months ago

@<1523701205467926528:profile|AgitatedDove14> About the proposed ways for fixing this issue, I've got my hands a little dirty with the code, and I think maybe adding another option to include some other files in the clearml-serving model add command would be beneficial here. Please suppose that I have the current directory for now:

├── common
│   ├── common.py
└── yolo8
    ├── 1
    │   ├── model_NVIDIA_GeForce_RTX_3080.plan
    │   └── model_Tesla_T4.plan
    ├── config.pbtxt
    └── preprocess.py
└── yolo7
    ├── 1
    │   ├── model_NVIDIA_GeForce_RTX_3080.plan
    │   └── model_Tesla_T4.plan
    ├── config.pbtxt
    └── preprocess.py

And now, I want to have the same code across these two models. If I want to add the entire directory here, as you can see, it will be complicated, or if possible, it might have some flaws. Regarding the second option to add preprocessing as Python packages and install them alongside other things at build time, I think it might have a syncing issue because that code will change a lot and would be an issue to install that package everytime.

If it can be handled in such a way that the preprocessing code is managed, it would be much cleaner, right? Also, I've come up with some sort of uploading preprocessing code as an artifact and then getting all of the tasks and then find the most updated common code and get a local copy from it and it is working for me. But I'm thinking maybe adding such capability into ClearML itself would be great, and I was wondering what your thoughts are on this. Is it okay to fork the repo and implement another option in here which will get a list of source codes and upload it into ClearML storage and then fetch it again in a way that preprocessing is handled? Should I do that and create a PR, or is it not necessary?

  
  
Posted 7 months ago
553 Views
8 Answers
7 months ago
7 months ago
Tags