NervousFrog58 it seems to be this failure will repeat - wouldn't it make more sense to fix your code so that such errors would not happen and not restart a failing experiment?
yes , either a code snippet or a builtin flag.
im using clearml==1.6.2
package and we are running version: 1.1.1-135 • 1.1.1 • 2.14
in the server.
in term of logs im getting :2022-07-07 16:33:59 [W 2022-07-07 16:33:59,801] Trial 8 failed, because the value None could not be cast to float. 2022-07-07 16:33:59 OptunaObjective result metric=None, iteration None 2022-07-07 16:33:59 [W 2022-07-07 16:33:59,920] Trial 11 failed, because the value None could not be cast to float. 2022-07-07 16:34:00 OptunaObjective result metric=None, iteration None
which is fine, the trials should have failed, im just looking for a way to restart them
I see... If you intercept them in your code, you can actually re-enqueue you code at that time...
the code is fine, these failures happen because of external circumstances that cannot be controlled
hi NervousFrog58
Can you share some more details with us please ?
Do you mean that when you have an experiment failing, you would like to have a snippet that reset and relaunch it, the way you do through the UI ?
Your ClearML packages version, and your logs would be very userful too 🙂