Reputation
Badges 1
119 × Eureka!I set the number to a crazy value and it fails around the same iteration
Last question CostlyOstrich36 sorry to poke you! Seems even though if I set an extremely long time it will still fail when the first plots are reported. The first plots are generated automatically by pytorch lightning and track the cpu and gpu usage. Do you think this could be the cause? or should it also detect the iteration.
AgitatedDove14 task.set_archived(True)
+ the cleanup service should do it π If we run in debug mode the experiment goes directly to the archive and gets cleaned and we donβt pollute the main experiment page.
This works:filepath = self.log_dir + os.sep + "checkpoint" self.callbacks.append( ModelCheckpoint( filepath, monitor="val_loss", mode="min", save_best_only=True, save_weights_only=True, ) )
And this doesnβt:
` filepath = self.log_dir + os.sep + "checkpoint.hdf5"
self.callbacks.append(
ModelCheckpoint(
filepath,
...
Sure! For torch I have:
torch==2.0.1
# via
# monai
# pytorch-lightning
# torchio
# torchmetrics