Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey There, I Am A New User Of Clearml And Really Enjoying It So Far! I Noticed That My Model Checkpoints Are Saved After Each Epoch. Instead I Would Like To Only Save The Best And Last Model Checkpoint. Is That Possible? I Could Not Find Something Regardi

Hey there, I am a new user of clearml and really enjoying it so far!
I noticed that my model checkpoints are saved after each epoch. Instead I would like to only save the best and last model checkpoint. Is that possible? I could not find something regarding this in the docs.

  
  
Posted one year ago
Votes Newest

Answers 4


Hi @<1547390464557060096:profile|NuttyKoala57> ! You can use wildcards in auto_connect_framework to filter your models. Check the docs under init: None . You might also want to check out this GH thread for an another way to do this: None

  
  
Posted one year ago

Thanks, I think I could identify the issue. I opened a bug here: None

The problem is with the keras BackupAndRestore callback, where clearml overwrites the local backup storage with a storage to the clearml server. In this case, however, the local storage is sufficient as this is only for continuing an interruption.

  
  
Posted one year ago

Yeah, it's because it's just hooking into the save operation and capturing the output, regardless of the parent call.

  
  
Posted one year ago

Depending on the framework you're using it'll just hook into the save model operation. Every time you save a model, which will probably happen every epoch for some subset of the training. If you want to do it with the existing framework you could change the checkpoint so that it only clones the best model in memory and saves the write operation for last. The risk with this is if the training crashes, you'll lose your best model.

Optionally, you could also disable the ClearML integration with your framework and manually specify when to sync everything to the server.

I'm still a bit new to the platform, I'd love to hear from others if there's another solution.

  
  
Posted one year ago
1K Views
4 Answers
one year ago
one year ago
Tags