![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/ConvolutedSealion94.png)
Reputation
Badges 1
89 × Eureka!Having human readable ids always help communication but programmatically we definitely going to use the "real" id. But I think we are too early into this and I will report back on how it is going with this.
I would think having a unique slug is a good idea so the team can communicate purely be that single number. Maybe we will call tasks as slug_yyyymmdd
ahh, because task_id is the "real" id of a task
this is a bit WIP but we save it with the design of the model:
` parameters = dict(self.parameters, model_type='XGBClassifier')
...
output_model.update_design(config_dict=parameters) `
Yes, this is exactly how I solved it at the end
and then have a wrapper that gets the model data and selects which way to construct and deserialise the model class.
` def get_model(task_id, model_name):
task = Task.get_task(task_id)
try:
model_data = next(model for model in task.models['output'] if model.name == model_name)
except StopIteration as ex:
raise ValueError(f'Model {model_name} not found in: {[model.name for model in task.models["output"]]}')
filename = model_data.get_local_copy()
model_type =...
because we already had these get_artifact(), get_model() functions that the DSes use to get the data into notebooks to further analyse their stuff, I might as well just use those with a custom preprocess and call the predict myself.
And then get_model
is what I wrote above, just uses the CML API to pick up the right model from the task_id and model_name and the model config contains the class name so get_model has an if/else structure in it to create the right class.
TBH the main reason I went with our API is that because of the custom model loading, we need to use the "custom" framework anyway.
I passed an env variable to the docker container so I figure this out
yeah so in docker run:-e TASK_ID='b5f339077b994a8ab97b8e0b4c5724e1' \ -e MODEL_NAME='best_model' \
and then in Preprocess:self.model = get_model(task_id=os.environ['TASK_ID'], model_name=os.environ['MODEL_NAME'])
I pass the IDs to the docker container as environment variables, so this does need restart for the docker container but I guess we can live with that for now
I absolutely need to pin the packages (incl main DS packages) I use.
I know there is a aux cfg with key value pairs but how can use it in the python code?"auxiliary_cfg": { "TASK_ID": "b5f339077b994a8ab97b8e0b4c5724e1", "V": 132 }
The DSes would expect the same interface as they used in the code that saved the model (me too TBH)
I'd rather just fail if they try to use a model that is unknown.
{"detail":"Error processing request: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)"}
while in our own code:if model_type == 'XGBClassifier': model = XGBClassifier() model.load_model(filename)
Ok, but I need to think with the head of the DS, this way they only need to remember (and me only need to teach them where to find) one id.
I expect the task to be the main entry point for all their work, and the above interface is easy to remember, check etc etc. Also it is the same as getting artifacts so less friction.
` def get_task(task_id):
return Task.get_task(task_id)
def get_artifact(task_id, artifact_name):
task = Task.get_task(task_id)
return task.artifacts[artifact...
import xgboost # noqa self._model = xgboost.Booster() self._model.load_model(self._get_local_model_file())
now on to the next pain point:
git status gives correct information
I think this is because of the version of xgboost that serving installs. How can I control these?
I was just looking at the model example. How does output model store the binary? For example of an xgboost model
I think I figured this out but now I have a problem:
auto_connect_frameworks={ 'xgboost': False, 'scikitlearn': False }
BTW you are not exporting Framework in __
init
__
so you need to import it like from clearml.model import Framework