Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

Unanswered

should reload the reported scalars

Exactly (notice it also understand when was the last report of scalars so it should automatically increase the iterations (i.e. you will not accidentally overwrite previously reported scalars)

and the task needs to reload last checkpoints only, right?

Correct 🙂

We didn't figure out the best way of continuing for both the grid and optuna. Can you suggest something?

That is a good point, not sure if we have a GH issue, for that but worth checking and if not opening one, it should not be difficult to serialize/deserialize the internal step of the HPO process.
When this will be implemented you could use the same "clearml-agent execute" to relaunch the HPO process as well
wdyt?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

214 Views

0 Answers

2 years ago