Hello, Ive Been Reading The Docs Of Hyperparameteroptimizer, And Various Questions In The Channel, But Couldn'T Find An Answer. I Have A Working Hpo Run, But Many Times Experiments Fail , For Uncontrollable Reasons. Is There A Way To Tell The Optimizer To

Answered

hello, ive been reading the docs of HyperParameterOptimizer, and various questions in the channel, but couldn't find an answer. I have a working HPO run, but many times experiments fail , for uncontrollable reasons. Is there a way to tell the optimizer to re-run these failed experiments? right now it just continues on and reports only the successful ones

  				
Posted 
	2 years ago

					More  		
  Report
		
					NervousFrog58
				
					0
					 × 1

Votes Newest

Answers 5

yes , either a code snippet or a builtin flag.
im using clearml==1.6.2 package and we are running version: 1.1.1-135 • 1.1.1 • 2.14 in the server.
in term of logs im getting :
2022-07-07 16:33:59 [W 2022-07-07 16:33:59,801] Trial 8 failed, because the value None could not be cast to float. 2022-07-07 16:33:59 OptunaObjective result metric=None, iteration None 2022-07-07 16:33:59 [W 2022-07-07 16:33:59,920] Trial 11 failed, because the value None could not be cast to float. 2022-07-07 16:34:00 OptunaObjective result metric=None, iteration Nonewhich is fine, the trials should have failed, im just looking for a way to restart them

  				
Posted 
	2 years ago

					More  		
  Report
		
					NervousFrog58
				
					0
					 × 1

hi NervousFrog58
Can you share some more details with us please ?
Do you mean that when you have an experiment failing, you would like to have a snippet that reset and relaunch it, the way you do through the UI ?
Your ClearML packages version, and your logs would be very userful too 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

I see... If you intercept them in your code, you can actually re-enqueue you code at that time...

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

NervousFrog58 it seems to be this failure will repeat - wouldn't it make more sense to fix your code so that such errors would not happen and not restart a failing experiment?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

the code is fine, these failures happen because of external circumstances that cannot be controlled

  				
Posted 
	2 years ago

					More  		
  Report
		
					NervousFrog58
				
					0
					 × 1

Write your answer

1K Views

5 Answers

2 years ago