If I keep track of 3 OutputModels
simultaneously, the weights would need to shift between them every epoch (like, updated weights for top-1, then top-1 becomes top-2, top-2 becomes top-3 etc)
if I just use plain boto3 to sync weights to/from S3, I just check how many files are stored in the location, and clear up the old ones
CostlyOstrich36 thank you for the answer! Maybe I just can delete old models along with corresponding tasks, seems to be easier
is there a some sort of OutputModel.remove
method? Docs say there isn't
How are you saving your models? torch.save ("<MODEL_NAME>")
?
hm, not quite clear how it is implemented. For example, this is how I do it now (explicitly)
Strictly speaking, there is only one training task, but I want to keep top-3 best checkpoints for it all the time
if the loss is lower than the best stored loss so far, add the new checkpoint and remove the top-4th
this is how I implemented it by myself. Looks like clearml functionality is quite opinionated and requires some tweaks every time I try to replace my own stuff with it
If I'm not mistaken, models reflect the file names. So if you recycle the file names you recycle the models. So if you save torch.save(" http://top1.pt ") then later torch.save(" http://top2.pt ") and even later do torch.save(" http://top1.pt ") again, you will only have 2 OutputModels, not three. This way you can keep recycling the best models 🙂
You mean you would like to delete an output model of a task if other models in the task surpass it?
e.g. if I want to store only top-3 running best checkpoints
This way I would want to keep track of 3 OutputModel
s and call update_weights
3 times every update - and probably do 2 redundant uploadings
Well, you can simply do the following:
Start with top 3 models named top1, top2, top3 Keep all 3 in disk cache during run Build logic to rate new model during run depending on it's standing compared to top 3 Decide on new standing of top 3 Perform update_weights_package
on the relevant "new" top 3 models once per modelThis is only from the top of my head. I'm sure you could create something better without even the need to cache 3 models during the run
` clearml_name = os.path.basename(save_path)
output_model_best = OutputModel(
task=task,
name=clearml_name,
tags=['running-best'])
output_model_best.update_weights(
save_path,
upload_uri=params.clearml_aws_checkpoints,
target_filename=clearml_name
) `
Is there a way to simplify it with ClearML, not make it more complicated?