Hey All, I Cannot Use Clearml With Accelerate For Uploading Checkpoints.

  • Accelerate handles the folder structure, so checkpoints are usually like <checkpoint_folder>/iteration_4000/pytorch.bin (example).
  • I initialise my clearml task with Task.init(..., output_uri=" None ")
  • Clearml creates some nested structures with the s3 key, however keeps overwriting the pytorch.bin for every checkpoint
  • Any way we specify to keep the fodler structure like iteration_4000/pytorch.bin ?
Posted 5 months ago
Hi @<1535069219354316800:profile|PerplexedRaccoon19> , currently the folder structure is fixed, a fact that is also used by other clearml components

Posted 5 months ago
5 months ago
5 months ago