Strictly speaking, there is only one training task, but I want to keep top-3 best checkpoints for it all the time
if I just use plain boto3 to sync weights to/from S3, I just check how many files are stored in the location, and clear up the old ones
is there a some sort of OutputModel.remove
method? Docs say there isn't
If I'm not mistaken, models reflect the file names. So if you recycle the file names you recycle the models. So if you save torch.save(" http://top1.pt ") then later torch.save(" http://top2.pt ") and even later do torch.save(" http://top1.pt ") again, you will only have 2 OutputModels, not three. This way you can keep recycling the best models 🙂
Is there a way to simplify it with ClearML, not make it more complicated?
this is how I implemented it by myself. Looks like clearml functionality is quite opinionated and requires some tweaks every time I try to replace my own stuff with it
` clearml_name = os.path.basename(save_path)
output_model_best = OutputModel(
task=task,
name=clearml_name,
tags=['running-best'])
output_model_best.update_weights(
save_path,
upload_uri=params.clearml_aws_checkpoints,
target_filename=clearml_name
) `
CostlyOstrich36 thank you for the answer! Maybe I just can delete old models along with corresponding tasks, seems to be easier
hm, not quite clear how it is implemented. For example, this is how I do it now (explicitly)
if the loss is lower than the best stored loss so far, add the new checkpoint and remove the top-4th
e.g. if I want to store only top-3 running best checkpoints
How are you saving your models? torch.save ("<MODEL_NAME>")
?
This way I would want to keep track of 3 OutputModel
s and call update_weights
3 times every update - and probably do 2 redundant uploadings
You mean you would like to delete an output model of a task if other models in the task surpass it?
If I keep track of 3 OutputModels
simultaneously, the weights would need to shift between them every epoch (like, updated weights for top-1, then top-1 becomes top-2, top-2 becomes top-3 etc)
Well, you can simply do the following:
Start with top 3 models named top1, top2, top3 Keep all 3 in disk cache during run Build logic to rate new model during run depending on it's standing compared to top 3 Decide on new standing of top 3 Perform update_weights_package
on the relevant "new" top 3 models once per modelThis is only from the top of my head. I'm sure you could create something better without even the need to cache 3 models during the run