Hello Guys, I Am Using Clearml Server To Run Hyperparameter Optimization. When Running It, Sometimes This Error Happens, But When Running Again The Same Code Runs Smoothly. Sometimes It Works And Sometimes Not. It Seems That Some Of The Base Taks Of The

Answered

Hello guys,
I am using clearml server to run hyperparameter optimization. When running it, sometimes this error happens, but when running again the same code runs smoothly. Sometimes it works and sometimes not. It seems that some of the base taks of the optimization is not logging the optimization metric, but the code that generate that is always the same... Checking in the base tasks some of them are indeed without the metric title and series. Would that be the error? How to ensure that the metric is always being logged?

[W 2025-04-08 21:37:59,248] Trial 1 failed with parameters: {'model_params/a': 185, 'model_params/b': 190835188.187596, 'model_params/c': 1.50, 'model_params/d': 2.1, 'risk_params/e': 579, 'risk_params/f': 710, 'risk_params/g': 5} because of the following error: TypeError("'NoneType' object is not subscriptable"). Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.12/task_repository/quant.git/.venv/lib/python3.12/site-packages/optuna/study/_optimize.py", line 197, in _run_trial value_or_values = func(trial) ^^^^^^^^^^^ File "/root/.clearml/venvs-builds/3.12/task_repository/quant.git/.venv/lib/python3.12/site-packages/clearml/automation/optuna/optuna.py", line 92, in objective objective_metric = objective_metric[0] ~~~~~~~~~~~~~~~~^^^ TypeError: 'NoneType' object is not subscriptable [W 2025-04-08 21:37:59,249] Trial 1 failed with value None.

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					UpsetPanda50
				
					0
					 × 1

Votes Newest

Answers 8

Still the same errors. :(

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					ThoughtlessTiger0
				
					0

I've recently run into this error myself. Did you find any resolution?

  				
Posted 
	5 months ago

					More
				  		
  Report
		
					EnthusiasticCow4
				
					0
					 × 1

Hi Nathan. The error was about an internal error on our simulation. After that bug fix everything was ok. But we still have the problem that when any trial fails, it breaks down all the simulation and the pipeline stops to create new trials and the simulation stops. Is the same when I have only one work that crash ( maybe some no free space or network problem) and when it happens the main pipeline receives a trial fail and after that it does not create more trials. And all the simulations starts to die because there is no more new trials.

  				
Posted 
	5 months ago

					More
				  		
  Report
		
					ThoughtlessTiger0
				
					0

Here for instance we had only two cases of TypeError: 'NoneType' object is not subscriptable, one is on line 9846. But as you see in the pic the workers are going down.

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					UpsetPanda50
				
					0
					 × 1

Hi @<1790190274475986944:profile|UpsetPanda50> , are you running them on the same machine/agent? Can you please provide a full log of one run that worked and one that didn't on the same machine?

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1523701070390366208:profile|CostlyOstrich36> , it's a pool of machines. I have attached two logs.

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					DeepOwl31
				
					0
					 × 1

And here is how the error appears. Trying to get the metric that was not logged.

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					UpsetPanda50
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> Here is HyperparameterOptimizer class

hpo = HyperParameterOptimizer(
        # Base experiment to optimize
        base_task_id=base_task_id,
        # Hyperparameters to tune
        hyper_parameters=param_ranges,
        # Objective metric
        objective_metric_title=[
            metric_title for metric_title in opt_conf.hpo_params.objective_metric_title
        ],
        objective_metric_series=[
            metric for metric in opt_conf.hpo_params.objective_metric_series
        ],
        objective_metric_sign=[
            direction for direction in opt_conf.hpo_params.objective_metric_sign
        ],
        # Optimization strategy
        optimizer_class=OptimizerOptuna,
        # Execution configuration
        execution_queue=opt_conf.hpo_params.execution_queue,
        save_top_k_tasks_only=-1,
        spawn_project=f"{opt_conf.task_params.project_name}/opt",
        min_iteration_per_job=opt_conf.hpo_params.min_iteration_per_job,
        max_iteration_per_job=opt_conf.hpo_params.max_iteration_per_job,
        # pool_period_min=40,
        # time_limit_per_job=120,
        # let us limit the number of concurrent experiments,
        # this in turn will make sure we do dont bombard the scheduler with experiments.
        # if we have an auto-scaler connected, this, by proxy, will limit the number of machine
        max_number_of_concurrent_tasks=opt_conf.hpo_params.max_number_of_concurrent_tasks,
        # set the maximum number of jobs to launch for the optimization, default (None) unlimited
        # If OptimizerBOHB is used, it defined the maximum budget in terms of full jobs
        # basically the cumulative number of iterations will not exceed total_max_jobs * max_iteration_per_job
        total_max_jobs=opt_conf.hpo_params.total_max_jobs,
        # optuna_pruner=pruner_dict.get(
        #     opt_conf.hpo_params.pruner
        # ),  # HyperbandPruner(min_resource=5, max_resource=80),
        # optuna_sampler=sampler_dict.get(opt_conf.hpo_params.sampler),

    )

and here the hpo_params used

hpo_params:
  objective-metric-title: ["HBT-KPI --- 2024-12-26 to 2025-01-12"]
  objective-metric-series: ["SR"]
  objective-metric-sign: ["max"]
  time-limit: 72000.0
  execution-queue: hpo_mmd
  min-iteration-per_job: 50
  max-iteration-per_job: 10000
  max-number-of-concurrent-tasks: 100
  total-max-jobs: 2000
  pruner: none
  sampler: none #random

Do you see any reason for the optimization finish before the total-max-jobs get reached?

I am not even sure if the issue is only that of not getting the metric. if that happens, I suppose the hpo should inject new parameters in the next iteration (since the max was not reached), but instead it stops running, and completes the optimization...

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					UpsetPanda50
				
					0
					 × 1

Write your answer

915 Views

8 Answers

7 months ago

5 months ago