e.g. if I want to store only top-3 running best checkpoints
this is how I implemented it by myself. Looks like clearml functionality is quite opinionated and requires some tweaks every time I try to replace my own stuff with it
Strictly speaking, there is only one training task, but I want to keep top-3 best checkpoints for it all the time
You mean you would like to delete an output model of a task if other models in the task surpass it?
if the loss is lower than the best stored loss so far, add the new checkpoint and remove the top-4th
CostlyOstrich36 thank you for the answer! Maybe I just can delete old models along with corresponding tasks, seems to be easier
is there a some sort of OutputModel.remove
method? Docs say there isn't
Is there a way to simplify it with ClearML, not make it more complicated?
Well, you can simply do the following:
Start with top 3 models named top1, top2, top3 Keep all 3 in disk cache during run Build logic to rate new model during run depending on it's standing compared to top 3 Decide on new standing of top 3 Perform update_weights_package
on the relevant "new" top 3 models once per modelThis is only from the top of my head. I'm sure you could create something better without even the need to cache 3 models during the run
If I'm not mistaken, models reflect the file names. So if you recycle the file names you recycle the models. So if you save torch.save(" http://top1.pt ") then later torch.save(" http://top2.pt ") and even later do torch.save(" http://top1.pt ") again, you will only have 2 OutputModels, not three. This way you can keep recycling the best models 🙂
` clearml_name = os.path.basename(save_path)
output_model_best = OutputModel(
task=task,
name=clearml_name,
tags=['running-best'])
output_model_best.update_weights(
save_path,
upload_uri=params.clearml_aws_checkpoints,
target_filename=clearml_name
) `
This way I would want to keep track of 3 OutputModel
s and call update_weights
3 times every update - and probably do 2 redundant uploadings
hm, not quite clear how it is implemented. For example, this is how I do it now (explicitly)
If I keep track of 3 OutputModels
simultaneously, the weights would need to shift between them every epoch (like, updated weights for top-1, then top-1 becomes top-2, top-2 becomes top-3 etc)
if I just use plain boto3 to sync weights to/from S3, I just check how many files are stored in the location, and clear up the old ones
How are you saving your models? torch.save ("<MODEL_NAME>")
?