Clearml Doesn'T Pick Up Model Checkpoints Automatically. Any Idea What Might Be Wrong? (Code Attached In The Thread). Thanks

Answered

ClearML doesn't pick up model checkpoints automatically. Any idea what might be wrong? (code attached in the thread). Thanks

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

Votes Newest

Answers 11

Which version? is this reproducible in this example?
None
(can you try with the latest clearml version 1.13.2?)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Oh, I misunderstood then docs/examples, sorry. I'm using pytorch-ignite.

Thanks for the tip!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

Hi @<1631102016807768064:profile|ZanySealion18>

ClearML doesn't pick up model checkpoints automatically.

What's the framework you are using?
BTW:

Task.add_requirements("requirements.txt")

if you want to specify Just your requirements.txt, do not use add_requirements use:

Task.force_requirements_env_freeze(requirements_file="requirements.txt")

(add requirements with a filename does the same thing, but this is more readable)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> Kind reminder again, thanks and sorry!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> kind reminder not to miss this when you catch time, thanks!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

No worries, sorry for pinging, was just making sure you (or anyone else who might help) doesn't miss it 🙂
I use Task.add_requirements("requirements.txt") right before the Task.init.
In main, I parse arguments command-line, add_requirements, initialize Task and call execute_remotely. After that it's all pretty much the usual workflow. Initialize the model, setup dataloaders, optimizer and run the training. I'm using pytorch-ignite and have model checkpoint made on validation evaluator COMPLETED event.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

Sorry for the delay 🙏 - how do you import your packages and where do you initialize ClearML relative to the rest of the code?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Kind ping on this thread, thanks! 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

clearml-1.13.1

Task.add_requirements("requirements.txt")
task = Task.init(project_name="My project", task_name="My task")
task.execute_remotely(queue_name="default")
...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

Hi @<1631102016807768064:profile|ZanySealion18> , can you provide more info on what framework you're using, which ClearML SDK version and how you're initializing the ClearML task?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

    model_checkpoint = ModelCheckpoint(
        "checkpoint",
        n_saved=2,
        filename_prefix="best",
        score_function=score_function,
        score_name="accuracy",
        global_step_transform=global_step_from_engine(trainer),
    )

    # Save the model after every epoch of val_evaluator is completed
    val_evaluator.add_event_handler(
        Events.COMPLETED, model_checkpoint, {"model": model}
    )

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ZanySealion18
				
					0
					 × 1

Write your answer

997 Views

11 Answers

one year ago