Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi. Inside A Notebook When I Cerate A New Clearml Task And Then Run Sklearn Gridsearchcv , Clearml Uploads A Lot Of Model. Is There A Way To Force Clearml Not To Upload These Models? Related Question Is What Are These Models Anyway? Their Name Only Contai

Hi. Inside a notebook when I cerate a new clearml task and then run sklearn GridSearchCV , clearml uploads a lot of model.
Is there a way to force clearml not to upload these models?
Related question is what are these models anyway? Their name only contain the task name and some unique id so how can i know to which exact training (i.e hyper parameter set or cv loop) each one belong?

  
  
Posted 2 years ago
Votes Newest

Answers 23


Interesting proposal. Why use the "post" callback and not the "pre" callback?

I guess i need to do something like the following after the task was created:
` from clearml.binding.frameworks import WeightsFileHandler

def callback(_, model_info):
model_info.name = "my new name"
return model_info

WeightsFileHandler.add_pre_callback(callback) `

  
  
Posted 2 years ago

I use sklearn's GridSearchCV (not clearml HPO)
so all models are part of the same experiment and has the experiment name in their name.
I don't see any hyper parameter in the model name.

  
  
Posted 2 years ago

Oh it makes sense now 🙏

  
  
Posted 2 years ago

I guess i need to do something like the following after the task was created:
...

Yes!

Why use the "post" callback and not the "pre" callback?

The post get's back the Model object. The pre allows you to decide if you actually want to log in the first place (come to think about it, maybe you want that as well 🙂 )

  
  
Posted 2 years ago

The object would be enough. The problem is that I currently don't have a way to get them "from outside".
I actually just tried to use model_info.local_model_path assuming its the pickled model file path(debug prints showed its a single file and not a directory) but it failed pickle.load

  
  
Posted 2 years ago

model upload and registration i should pass something like

'xgboost': False

or

'xgboost': False, 'scikit': False

?

Exactly! which framework are you using ?

about 2, I refer to the names of the models.

Hmm that is a good point to test, usually this is based on the Task name (I think), so if the Task name contains the HPO params in the name it should be the same on the model name. Do you see the HPO params on the Task name ? Should we open a GitHub issue?

  
  
Posted 2 years ago

mmm... Since they are saved "automatically" without my intervention i am not sure i can know to which "training" (hyperparams and training set combination) each one belongs to.

  
  
Posted 2 years ago

How can i obtain the actual trained model class inside the callback function ? basically i need to know what are its hyperparameters.

  
  
Posted 2 years ago

There were two types of model upload. The first one is clearml automatic upload when GridSearchCV was running.
The second one is manual by us when GridSearchCV finished and we got a final model. We "manually" uploaded this model and had control over its name.

My question was about the automatically uploaded models. Those that were uploaded by clearml client.

  
  
Posted 2 years ago

trained model class...

You mean the pytorch model object?

  
  
Posted 2 years ago

The problem is that I currently don't have a way to get them "from outside".

Maybe as a hack (until we add the model object)
` class MyModelCB:
current_args = dict()
@classmethod
def callback(load_save, model_info):
if load_save != "save":
return model_info
model_info.name = "my new name" + str(current_args) # make a name from args
return model_info

WeightsFileHandler.add_pre_callback(MyModelCB.callback)
MyModelCB.current_args = {"args": "value"} `wdyt?

  
  
Posted 2 years ago

Is there a way to force clearml not to upload these models?

DistressedGoat23 is it uploading models or registering them? to disable both set auto_connect_frameworks https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk#automatic-logging

Their name only contain the task name and some unique id so how can i know to which exact training

You mean the models or the experiments being created ?

  
  
Posted 2 years ago

about 1, It uploads the models as artifacts and i also see them in the web UI in the model list.
The document is not clear enough but if understand your answer to disable only the model upload and registration i should pass something like
'xgboost': False or 'xgboost': False, 'scikit': False ?

about 2, I refer to the names of the models.

Thanks!

  
  
Posted 2 years ago

We do upload the final model manually.

wait you said upload manually, and now you are saying "saved automatically", I'm confused.

  
  
Posted 2 years ago

DistressedGoat23
you can now access the weights model object
pip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info

WeightsFileHandler.add_pre_callback(callback) `

  
  
Posted 2 years ago

In my case its the xgboost model object , but yes.

  
  
Posted 2 years ago

Won't help since the problem is that i don't know the model args (its hidden inside GridSearchCV implementation and i can't access it).

Related to that, I was able to unpickle the file that you upload as a model (MODEL URL in the UI model list). It turns out to be a joblib pickled file but the content seem strange: a numpy array of the form [0,1,2,3,...] (so each cell contains its offset).
Is that normal or a possible bug?

  
  
Posted 2 years ago

Is that normal or a possible bug?

This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...

  
  
Posted 2 years ago

so all models are part of the same experiment and has the experiment name in their name.

Oh that explains it, (1) you can use the model filename to control the model name in clearml (2) you can disable the autologging and manually upload the model, then you can control the model name
wdyt?

  
  
Posted 2 years ago

We do upload the final model manually.

If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229

I was just wondering if i can make the autologging usable.

It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:
Task.current_task().models["output"][-1].name = "my new name"

  
  
Posted 2 years ago

My question was about the automatically uploaded models. Those that were uploaded by clearml client.

So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137
def callback(_, model_info): model_info.name = "my new name" return model_info

  
  
Posted 2 years ago

Hmm apparently it is not passed, but it could be.
Would the object itslef be enough to get the values? wouldn't it make sense to get them from outside somehow? (I'm assuming there is one set of args used at any certain moment?)

  
  
Posted 2 years ago

We do upload the final model manually.
I was just wondering if i can make the autologging usable. Right now when i don't know (at least in the web ui) on which hyperparameter set the model was trained on and on which data (full train set, one of the cv combinations) i have no use for these uploaded models.

  
  
Posted 2 years ago
892 Views
23 Answers
2 years ago
one year ago
Tags