Hi @<1523701842515595264:profile|PleasantOwl46> , I think that is what happening. If server is down, code continues running as if nothing happened and ClearML will simply cache all results and flush them once server is back up
@<1523701070390366208:profile|CostlyOstrich36> unfortunately, this is not the behavior we are seeing
same exact issue happen tonight
on epoch number 53 ClearML were shut down, the job did not continue to epoch 54 and eventually got killed with watchdog timer