hello, ive been reading the docs of HyperParameterOptimizer, and various questions in the channel, but couldn't find an answer. I have a working HPO run, but many times experiments fail , for uncontrollable reasons. Is there a way to tell the optimizer to re-run these failed experiments? right now it just continues on and reports only the successful ones

Posted one year ago
NervousFrog58 it seems to be this failure will repeat - wouldn't it make more sense to fix your code so that such errors would not happen and not restart a failing experiment?

Posted one year ago

hi NervousFrog58
Can you share some more details with us please ?
Do you mean that when you have an experiment failing, you would like to have a snippet that reset and relaunch it, the way you do through the UI ?
Your ClearML packages version, and your logs would be very userful too 🙂

Posted one year ago

the code is fine, these failures happen because of external circumstances that cannot be controlled

Posted one year ago

I see... If you intercept them in your code, you can actually re-enqueue you code at that time...

Posted one year ago

yes , either a code snippet or a builtin flag.
im using clearml==1.6.2 package and we are running version: 1.1.1-135 • 1.1.1 • 2.14 in the server.
in term of logs im getting :
2022-07-07 16:33:59 [W 2022-07-07 16:33:59,801] Trial 8 failed, because the value None could not be cast to float. 2022-07-07 16:33:59 OptunaObjective result metric=None, iteration None 2022-07-07 16:33:59 [W 2022-07-07 16:33:59,920] Trial 11 failed, because the value None could not be cast to float. 2022-07-07 16:34:00 OptunaObjective result metric=None, iteration Nonewhich is fine, the trials should have failed, im just looking for a way to restart them

Posted one year ago
