Hi. Inside A Notebook When I Cerate A New Clearml Task And Then Run Sklearn Gridsearchcv , Clearml Uploads A Lot Of Model. Is There A Way To Force Clearml Not To Upload These Models? Related Question Is What Are These Models Anyway? Their Name Only Contai

Answered

Hi. Inside a notebook when I cerate a new clearml task and then run sklearn GridSearchCV , clearml uploads a lot of model.
Is there a way to force clearml not to upload these models?
Related question is what are these models anyway? Their name only contain the task name and some unique id so how can i know to which exact training (i.e hyper parameter set or cv loop) each one belong?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

Votes Newest

Answers 23

trained model class...

You mean the pytorch model object?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

mmm... Since they are saved "automatically" without my intervention i am not sure i can know to which "training" (hyperparams and training set combination) each one belongs to.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

Interesting proposal. Why use the "post" callback and not the "pre" callback?

I guess i need to do something like the following after the task was created:
` from clearml.binding.frameworks import WeightsFileHandler

def callback(_, model_info):
model_info.name = "my new name"
return model_info

WeightsFileHandler.add_pre_callback(callback) `

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

so all models are part of the same experiment and has the experiment name in their name.

Oh that explains it, (1) you can use the model filename to control the model name in clearml (2) you can disable the autologging and manually upload the model, then you can control the model name
wdyt?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The object would be enough. The problem is that I currently don't have a way to get them "from outside".
I actually just tried to use model_info.local_model_path assuming its the pickled model file path(debug prints showed its a single file and not a directory) but it failed pickle.load

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

In my case its the xgboost model object , but yes.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

Is there a way to force clearml not to upload these models?

DistressedGoat23 is it uploading models or registering them? to disable both set auto_connect_frameworks https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk#automatic-logging

Their name only contain the task name and some unique id so how can i know to which exact training

You mean the models or the experiments being created ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

DistressedGoat23
you can now access the weights model object
pip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info

WeightsFileHandler.add_pre_callback(callback) `

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Oh it makes sense now 🙏

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

There were two types of model upload. The first one is clearml automatic upload when GridSearchCV was running.
The second one is manual by us when GridSearchCV finished and we got a final model. We "manually" uploaded this model and had control over its name.

My question was about the automatically uploaded models. Those that were uploaded by clearml client.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

Hmm apparently it is not passed, but it could be.
Would the object itslef be enough to get the values? wouldn't it make sense to get them from outside somehow? (I'm assuming there is one set of args used at any certain moment?)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I use sklearn's GridSearchCV (not clearml HPO)
so all models are part of the same experiment and has the experiment name in their name.
I don't see any hyper parameter in the model name.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

about 1, It uploads the models as artifacts and i also see them in the web UI in the model list.
The document is not clear enough but if understand your answer to disable only the model upload and registration i should pass something like
'xgboost': False or 'xgboost': False, 'scikit': False ?

about 2, I refer to the names of the models.

Thanks!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

My question was about the automatically uploaded models. Those that were uploaded by clearml client.

So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137
def callback(_, model_info): model_info.name = "my new name" return model_info

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I guess i need to do something like the following after the task was created:
...

Yes!

Why use the "post" callback and not the "pre" callback?

The post get's back the Model object. The pre allows you to decide if you actually want to log in the first place (come to think about it, maybe you want that as well 🙂 )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

How can i obtain the actual trained model class inside the callback function ? basically i need to know what are its hyperparameters.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

The problem is that I currently don't have a way to get them "from outside".

Maybe as a hack (until we add the model object)
` class MyModelCB:
current_args = dict()
@classmethod
def callback(load_save, model_info):
if load_save != "save":
return model_info
model_info.name = "my new name" + str(current_args) # make a name from args
return model_info

WeightsFileHandler.add_pre_callback(MyModelCB.callback)
MyModelCB.current_args = {"args": "value"} `wdyt?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We do upload the final model manually.

wait you said upload manually, and now you are saying "saved automatically", I'm confused.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

model upload and registration i should pass something like

'xgboost': False

or

'xgboost': False, 'scikit': False

?

Exactly! which framework are you using ?

about 2, I refer to the names of the models.

Hmm that is a good point to test, usually this is based on the Task name (I think), so if the Task name contains the HPO params in the name it should be the same on the model name. Do you see the HPO params on the Task name ? Should we open a GitHub issue?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We do upload the final model manually.

If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229

I was just wondering if i can make the autologging usable.

It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:
Task.current_task().models["output"][-1].name = "my new name"

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is that normal or a possible bug?

This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Won't help since the problem is that i don't know the model args (its hidden inside GridSearchCV implementation and i can't access it).

Related to that, I was able to unpickle the file that you upload as a model (MODEL URL in the UI model list). It turns out to be a joblib pickled file but the content seem strange: a numpy array of the form [0,1,2,3,...] (so each cell contains its offset).
Is that normal or a possible bug?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

We do upload the final model manually.
I was just wondering if i can make the autologging usable. Right now when i don't know (at least in the web ui) on which hyperparameter set the model was trained on and on which data (full train set, one of the cv combinations) i have no use for these uploaded models.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DistressedGoat23
				
					0
					 × 1

Write your answer

948 Views

23 Answers

2 years ago

one year ago