I imagine that one workaround is to
Disable automatic model uploads Perform manual model upload (with the correct name).Can you point me to how to do these?
BTW:
If I try to find the right model in the
task.models["output"]
(this time there is just one but in my code there may be several) it appears with the
(see other attached screenshot).
What would make sense here ? (I have to be honest I'm not sure).
To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task (i.e. task.models["output"]["model-key"]
)
you can use Task.update_output_model()
to update the name of the output moel
Right. Thanks.
With several models saved by the training process (whose code is not task-aware) I suspect that doing the update call after training completed will only update the last of the uploaded models.
I'm currently looking at a workaround where:
I disable auto saving by https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-logging Manually upload the models Manually register the models with https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/examples/reporting/model_reporting.py
yes. several checkpoints + the one that did best on validation data.
it requires you set the weights and the framework name
Ooh nice.
I wasn't aware task.models["output"]
also acts like a dict.
I can get the one I care about in my code with something like task.models["output"]["best_model"]
however can you see the inconsistency between the key and the name there:
not sure if it will work but its worth a try
alternatively you could create a new OutputModel like here: https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py , not sure if there is a way to stop the automatic uploading
another weird thing:
Before my training task is done:print(task.models['output'].keys())
outputsodict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
after task.close()
I can do:task = Task.get_task(task_id) for i in range(100): print(task.models["output"].keys())
which printsodict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
in the first iteration
and prints the file names in the latter iterations:odict_keys(['best_model_scripted', 'last', 'last_scripted'])
I guess it takes some time before the the correct names are assigned?
PanickyMoth78 ScantMoth28
With several models saved by the training process (whose code is not task-aware)
You can actually specify which models to be saved:task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})
https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
This way you can upload only the model you need.
Disable automatic model uploads
Disable the auto uploadtask = Task.init(..., auto_connect_frameworks{'pytorch': False})
ahh i see so one Task has multiple models that are trained
i.e.Task.update_output_model(name="custom_model")
BTW:
If I try to find the right model in the
task.models["output"]
(this time there is just one but in my code there may be several) it appears with the
(see other attached screenshot).
What would make sense here ? (I have to be honest I'm not sure).
If the model was saved with a file name (is that the trigger for auto-upload?), I think it makes sense for the model name to match the file name (not the task name), especially when there may be several models per task
To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task
not sure why the two fields don't simply match. I guess that there may be situations where file name (without the full path) may be used several times.
however can you see the inconsistency between the key and the name there:
Yes that was my point on "uniqueness" ... 😞
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?
sort of. Though it seems like the rules for model.name can be a bit non-obvious.
I think that the first model saved gets the task name as its name and the following models take f"{task_name} - {file_name}"
anyhow - looks like the keys are simple enough to use (so I can just ignore the model names)
I think that the first model saved gets the task name as its name and the following models take
f"{task_name} - {file_name}"
Hmm, I'm not sure what would be a good way to make it consistent, would it make sense to always have the model file name?
I guess it takes some time before the the correct names are assigned?
Hmm that is odd, I have a feeling it has to do with calling Task.close()?!
I just tried with the latest clearml version and it seemed to work as expected