Unanswered
Hi All
Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag.
I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The
So we have managed to get whole checkpoint files to save by removing the save_total_limit
from training, this seems to save checkpoint folders with all files in it. however now we have a ballooning server.
did discover this None
and wondering if there's some nuance in autotracking that needs to be circumvented
44 Views
0
Answers
4 months ago
4 months ago