Reputation
Badges 1
89 × Eureka!When you spin the model you can tell it any additional packages you might need
What does spin mean in this context?
clearml-serving ...
?
and immediately complained about a package missing, which apparently I can't specify when I establish the model endpoint but I need to re compose the docker container by passing an env variable to it????
ahh, because task_id is the "real" id of a task
I would think having a unique slug is a good idea so the team can communicate purely be that single number. Maybe we will call tasks as slug_yyyymmdd
import xgboost # noqa self._model = xgboost.Booster() self._model.load_model(self._get_local_model_file())
TBH the main reason I went with our API is that because of the custom model loading, we need to use the "custom" framework anyway.
these are the service instances (basically increased visibility into what's going on inside the serving containers
But these have: different task ids, same endpoints (from looking through the tabs)
So I am not sure why they are here and why not somewhere else
because we already had these get_artifact(), get_model() functions that the DSes use to get the data into notebooks to further analyse their stuff, I might as well just use those with a custom preprocess and call the predict myself.
And then get_model
is what I wrote above, just uses the CML API to pick up the right model from the task_id and model_name and the model config contains the class name so get_model has an if/else structure in it to create the right class.
I know there is a aux cfg with key value pairs but how can use it in the python code?"auxiliary_cfg": { "TASK_ID": "b5f339077b994a8ab97b8e0b4c5724e1", "V": 132 }
while in our own code:if model_type == 'XGBClassifier': model = XGBClassifier() model.load_model(filename)
I absolutely need to pin the packages (incl main DS packages) I use.
{"detail":"Error processing request: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)"}
I think this is because of the version of xgboost that serving installs. How can I control these?
Ok, but I need to think with the head of the DS, this way they only need to remember (and me only need to teach them where to find) one id.
I expect the task to be the main entry point for all their work, and the above interface is easy to remember, check etc etc. Also it is the same as getting artifacts so less friction.
` def get_task(task_id):
return Task.get_task(task_id)
def get_artifact(task_id, artifact_name):
task = Task.get_task(task_id)
return task.artifacts[artifact...
I solve the artifact,dataset,table,scalar anything by simply making foo.run()
return a dictionary like{ 'artifact_X_train': X_train, 'table_confusion_matrix: cf, 'dataset_x': _ ... }
And then call the appropriate logger or artifact uploader or dataset uploader. (In case the dataset uploader I use the foo.output_file which every foo has.
task = Task.init(...) for foo in foos: data = foo.run() for key,value in data.items(): if key in 'auc', 'f1', etc: logger.log(key, value) elif key.startswith('model'): savemodel etc
TBH ClearML doesn't seem to be picking the model up so I need to do it manually
I was just looking at the model example. How does output model store the binary? For example of an xgboost model
Is there an explicit OutputModel + xgboost example somewhere?
Why do I need an output_uri for the model saving? The dataset API can figure this out on its own
I want the model to be stored in a way that clearml-serving can recognise it as a model
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)
Having human readable ids always help communication but programmatically we definitely going to use the "real" id. But I think we are too early into this and I will report back on how it is going with this.
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
pickle.dump({ 'model': model, 'X_train': X_train, 'Y_train': Y_train, 'X_test': X_test, 'Y_test': Y_test, 'impute_values': impute_values }, open(self.output_filename, 'wb'))