My current use case is just research for training pytorch models.
Ok, that looks good. It would be good to have an easier restart functionality as from the looks of things its a couple of layer deep. I'll let you know how if I manage it, might be useful.
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study
Hi UnevenBee3
the optuna study is stored on the optuna class
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/optuna/optuna.py#L186
And actually you could store and restore it
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/optuna/optuna.py#L104
I think we should improve the interface though, maybe also add get_study(), wdyt?
If you could provide the specific task ID then it could fetch the training data and study from the previous task and continue with the specified number of trainings. The reason that being able to continue from a past study would be useful is that the study provides a base for pruning and optimization of the task. The task would be stopped by aborting when the gpu-rig that it is using is needed or the study crashes.
As a follow up to this it seems that the study data must be fetched from a remote SQL server. the "storage" arg. It would be amazing to be able to store the study as an artefact in the clearml task. AgitatedDove14
yes, that makes sense to me.
What is your specific use case, meaning when/how do you stop / launch the hpo?
Would it make sense to continue from a previous execution and just provide the Task ID? Wdyt?
If you could provide the specific task ID then it could fetch the training data and study from the previous task and continue with the specified number of trainings.
Yes exactly, and also all the definitions for the HPO process (variables space, study etc.)
The reason that being able to continue from a past study would be useful is that the study provides a base for pruning and optimization of the task. The task would be stopped by aborting when the gpu-rig that it is using is needed or the study crashes.
that makes sense
My current use case is just research for training pytorch models.
👍
UnevenBee3 just to make sure we do not forget to add it, maybe you could open a GitHub issue with the feature request?