Which version? is this reproducible in this example?
None
(can you try with the latest clearml version 1.13.2?)
Oh, I misunderstood then docs/examples, sorry. I'm using pytorch-ignite.
Thanks for the tip!
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML doesn't pick up model checkpoints automatically.
What's the framework you are using?
BTW:
Task.add_requirements("requirements.txt")
if you want to specify Just your requirements.txt, do not use add_requirements use:
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
(add requirements with a filename does the same thing, but this is more readable)
@<1523701087100473344:profile|SuccessfulKoala55> Kind reminder again, thanks and sorry!
@<1523701087100473344:profile|SuccessfulKoala55> kind reminder not to miss this when you catch time, thanks!
No worries, sorry for pinging, was just making sure you (or anyone else who might help) doesn't miss it 🙂
I use Task.add_requirements("requirements.txt") right before the Task.init.
In main, I parse arguments command-line, add_requirements, initialize Task and call execute_remotely. After that it's all pretty much the usual workflow. Initialize the model, setup dataloaders, optimizer and run the training. I'm using pytorch-ignite and have model checkpoint made on validation evaluator COMPLETED event.
Sorry for the delay 🙏 - how do you import your packages and where do you initialize ClearML relative to the rest of the code?
clearml-1.13.1
Task.add_requirements("requirements.txt")
task = Task.init(project_name="My project", task_name="My task")
task.execute_remotely(queue_name="default")
...
Hi @<1631102016807768064:profile|ZanySealion18> , can you provide more info on what framework you're using, which ClearML SDK version and how you're initializing the ClearML task?
model_checkpoint = ModelCheckpoint(
"checkpoint",
n_saved=2,
filename_prefix="best",
score_function=score_function,
score_name="accuracy",
global_step_transform=global_step_from_engine(trainer),
)
# Save the model after every epoch of val_evaluator is completed
val_evaluator.add_event_handler(
Events.COMPLETED, model_checkpoint, {"model": model}
)