the code is fine, these failures happen because of external circumstances that cannot be controlled
hi NervousFrog58
Can you share some more details with us please ?
Do you mean that when you have an experiment failing, you would like to have a snippet that reset and relaunch it, the way you do through the UI ?
Your ClearML packages version, and your logs would be very userful too 🙂
yes , either a code snippet or a builtin flag.
im using clearml==1.6.2
package and we are running version: 1.1.1-135 • 1.1.1 • 2.14
in the server.
in term of logs im getting :2022-07-07 16:33:59 [W 2022-07-07 16:33:59,801] Trial 8 failed, because the value None could not be cast to float. 2022-07-07 16:33:59 OptunaObjective result metric=None, iteration None 2022-07-07 16:33:59 [W 2022-07-07 16:33:59,920] Trial 11 failed, because the value None could not be cast to float. 2022-07-07 16:34:00 OptunaObjective result metric=None, iteration None
which is fine, the trials should have failed, im just looking for a way to restart them
I see... If you intercept them in your code, you can actually re-enqueue you code at that time...
NervousFrog58 it seems to be this failure will repeat - wouldn't it make more sense to fix your code so that such errors would not happen and not restart a failing experiment?