And the last question on top of that (sorry!), regarding the concept of OUTPUT MODELS and MODEL NAMES. For this example, I only used one saver to save off 2 last checkpoints. When model is being uploaded for the first time the MODEL NAME
in the UI is full and correct (as you can see in the first screenshot), but when it is being overwritten in the following epochs it only shows name of the experiment in the MODEL NAME
and therefore all the info which was stored in the filename (like epoch number, score value, etc. is being missed, and there is no clear way on how to restore it, except from just checking manually how many epochs there were, or, for example, on what epoch the score of the target metric was the lowest). So actually 2 questions, is it specific to ClearMLSaver()
that in OUTPUT MODELS
in the UI we have the following names {filename_prefix}_checkpoint_{n}.pt
(where n is from 0 to n_saved-1
) instead of {filename_prefix}_checkpoint_{epoch_number}.pt
? And would it be possible to keep full MODEL NAME
during the training and get it updated every time saver overwrites the model.