Won't help since the problem is that i don't know the model args (its hidden inside GridSearchCV implementation and i can't access it).
Related to that, I was able to unpickle the file that you upload as a model (MODEL URL in the UI model list). It turns out to be a joblib pickled file but the content seem strange: a numpy array of the form [0,1,2,3,...] (so each cell contains its offset).
Is that normal or a possible bug?
so all models are part of the same experiment and has the experiment name in their name.
Oh that explains it, (1) you can use the model filename to control the model name in clearml (2) you can disable the autologging and manually upload the model, then you can control the model name
wdyt?
In my case its the xgboost model object , but yes.
mmm... Since they are saved "automatically" without my intervention i am not sure i can know to which "training" (hyperparams and training set combination) each one belongs to.
The problem is that I currently don't have a way to get them "from outside".
Maybe as a hack (until we add the model object)
` class MyModelCB:
current_args = dict()
@classmethod
def callback(load_save, model_info):
if load_save != "save":
return model_info
model_info.name = "my new name" + str(current_args) # make a name from args
return model_info
WeightsFileHandler.add_pre_callback(MyModelCB.callback)
MyModelCB.current_args = {"args": "value"} `wdyt?
There were two types of model upload. The first one is clearml automatic upload when GridSearchCV was running.
The second one is manual by us when GridSearchCV finished and we got a final model. We "manually" uploaded this model and had control over its name.
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
DistressedGoat23
you can now access the weights model objectpip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info
WeightsFileHandler.add_pre_callback(callback) `
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137def callback(_, model_info): model_info.name = "my new name" return model_info
How can i obtain the actual trained model class inside the callback function ? basically i need to know what are its hyperparameters.
Is there a way to force clearml not to upload these models?
DistressedGoat23 is it uploading models or registering them? to disable both set auto_connect_frameworks https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk#automatic-logging
Their name only contain the task name and some unique id so how can i know to which exact training
You mean the models or the experiments being created ?
We do upload the final model manually.
I was just wondering if i can make the autologging usable. Right now when i don't know (at least in the web ui) on which hyperparameter set the model was trained on and on which data (full train set, one of the cv combinations) i have no use for these uploaded models.
trained model class...
You mean the pytorch model object?
We do upload the final model manually.
If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229
I was just wondering if i can make the autologging usable.
It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:Task.current_task().models["output"][-1].name = "my new name"
The object would be enough. The problem is that I currently don't have a way to get them "from outside".
I actually just tried to use model_info.local_model_path
assuming its the pickled model file path(debug prints showed its a single file and not a directory) but it failed pickle.load
about 1, It uploads the models as artifacts and i also see them in the web UI in the model list.
The document is not clear enough but if understand your answer to disable only the model upload and registration i should pass something like'xgboost': False
or 'xgboost': False, 'scikit': False
?
about 2, I refer to the names of the models.
Thanks!
Hmm apparently it is not passed, but it could be.
Would the object itslef be enough to get the values? wouldn't it make sense to get them from outside somehow? (I'm assuming there is one set of args used at any certain moment?)
I guess i need to do something like the following after the task was created:
...
Yes!
Why use the "post" callback and not the "pre" callback?
The post get's back the Model object. The pre allows you to decide if you actually want to log in the first place (come to think about it, maybe you want that as well 🙂 )
Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...
We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Interesting proposal. Why use the "post" callback and not the "pre" callback?
I guess i need to do something like the following after the task was created:
` from clearml.binding.frameworks import WeightsFileHandler
def callback(_, model_info):
model_info.name = "my new name"
return model_info
WeightsFileHandler.add_pre_callback(callback) `
I use sklearn's GridSearchCV (not clearml HPO)
so all models are part of the same experiment and has the experiment name in their name.
I don't see any hyper parameter in the model name.
model upload and registration i should pass something like
'xgboost': False
or
'xgboost': False, 'scikit': False
?
Exactly! which framework are you using ?
about 2, I refer to the names of the models.
Hmm that is a good point to test, usually this is based on the Task name (I think), so if the Task name contains the HPO params in the name it should be the same on the model name. Do you see the HPO params on the Task name ? Should we open a GitHub issue?