Hi. I'M Encountering A Problem With

Answered

Hi. I'm encountering a problem with model.name
At least, for models that where auto-magically uploaded.

I see it in my own code but you can see it if you run the example from the clearml repo here:
https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py
The code saves the model to ./cifar_net.pth as seen here:
https://github.com/allegroai/clearml/blob/5b385907562ff33ed84939053bd3a4cb2839adc9/examples/datasets/data_ingestion.py#L207-L208
The correct model name is listed in UI under task artifacts (see attached screenshot) however, If I try to find the right model in the task.models["output"] (this time there is just one but in my code there may be several) it appears with the https://github.com/allegroai/clearml/blob/5b385907562ff33ed84939053bd3a4cb2839adc9/examples/datasets/data_ingestion.py#L23 (see other attached screenshot).

Looking in my own code - it seems like the first saved model takes on the task name whereas additional models take the save file name.
Is there a way to fix this (or a workaround)?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Votes Newest

Answers 22

I think that the first model saved gets the task name as its name and the following models take

f"{task_name} - {file_name}"

Hmm, I'm not sure what would be a good way to make it consistent, would it make sense to always have the model file name?

I guess it takes some time before the the correct names are assigned?

Hmm that is odd, I have a feeling it has to do with calling Task.close()?!
I just tried with the latest clearml version and it seemed to work as expected

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

https://clear.ml/docs/latest/docs/references/sdk/task#update_output_model

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

Disable automatic model uploads

Disable the auto upload
task = Task.init(..., auto_connect_frameworks{'pytorch': False})

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

PanickyMoth78 ScantMoth28

With several models saved by the training process (whose code is not task-aware)

You can actually specify which models to be saved:
task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})https://clear.ml/docs/latest/docs/references/sdk/task#taskinit

This way you can upload only the model you need.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

it requires you set the weights and the framework name

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

not sure if it will work but its worth a try

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

sort of. Though it seems like the rules for model.name can be a bit non-obvious.
I think that the first model saved gets the task name as its name and the following models take f"{task_name} - {file_name}"

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

anyhow - looks like the keys are simple enough to use (so I can just ignore the model names)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

ahh i see so one Task has multiple models that are trained

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

model*

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

yes. several checkpoints + the one that did best on validation data.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

BTW:

If I try to find the right model in the

task.models["output"]

(this time there is just one but in my code there may be several) it appears with the

(see other attached screenshot).

What would make sense here ? (I have to be honest I'm not sure).

If the model was saved with a file name (is that the trigger for auto-upload?), I think it makes sense for the model name to match the file name (not the task name), especially when there may be several models per task

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task

not sure why the two fields don't simply match. I guess that there may be situations where file name (without the full path) may be used several times.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

BTW:

If I try to find the right model in the

task.models["output"]

(this time there is just one but in my code there may be several) it appears with the

(see other attached screenshot).

What would make sense here ? (I have to be honest I'm not sure).
To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task (i.e. task.models["output"]["model-key"] )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

i.e.
Task.update_output_model(name="custom_model")

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

another weird thing:
Before my training task is done:
print(task.models['output'].keys())outputs
odict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
after task.close()
I can do:
task = Task.get_task(task_id) for i in range(100): print(task.models["output"].keys())which prints
odict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])in the first iteration
and prints the file names in the latter iterations:
odict_keys(['best_model_scripted', 'last', 'last_scripted'])I guess it takes some time before the the correct names are assigned?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Ooh nice.
I wasn't aware task.models["output"] also acts like a dict.
I can get the one I care about in my code with something like task.models["output"]["best_model"]
however can you see the inconsistency between the key and the name there:

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

however can you see the inconsistency between the key and the name there:

Yes that was my point on "uniqueness" ... 😞
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I imagine that one workaround is to
Disable automatic model uploads Perform manual model upload (with the correct name).Can you point me to how to do these?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

alternatively you could create a new OutputModel like here: https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py , not sure if there is a way to stop the automatic uploading

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

you can use Task.update_output_model() to update the name of the output moel

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ScantMoth28
				
					0
					 × 1

Right. Thanks.
With several models saved by the training process (whose code is not task-aware) I suspect that doing the update call after training completed will only update the last of the uploaded models.
I'm currently looking at a workaround where:
I disable auto saving by https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-logging Manually upload the models Manually register the models with https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/examples/reporting/model_reporting.py

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Write your answer

2K Views

22 Answers

2 years ago