Hello Guys, I Am Using Clearml Server To Run Hyperparameter Optimization. When Running It, Sometimes This Error Happens, But When Running Again The Same Code Runs Smoothly. Sometimes It Works And Sometimes Not. It Seems That Some Of The Base Taks Of The

Unanswered

@<1523701070390366208:profile|CostlyOstrich36> Here is HyperparameterOptimizer class

hpo = HyperParameterOptimizer(
        # Base experiment to optimize
        base_task_id=base_task_id,
        # Hyperparameters to tune
        hyper_parameters=param_ranges,
        # Objective metric
        objective_metric_title=[
            metric_title for metric_title in opt_conf.hpo_params.objective_metric_title
        ],
        objective_metric_series=[
            metric for metric in opt_conf.hpo_params.objective_metric_series
        ],
        objective_metric_sign=[
            direction for direction in opt_conf.hpo_params.objective_metric_sign
        ],
        # Optimization strategy
        optimizer_class=OptimizerOptuna,
        # Execution configuration
        execution_queue=opt_conf.hpo_params.execution_queue,
        save_top_k_tasks_only=-1,
        spawn_project=f"{opt_conf.task_params.project_name}/opt",
        min_iteration_per_job=opt_conf.hpo_params.min_iteration_per_job,
        max_iteration_per_job=opt_conf.hpo_params.max_iteration_per_job,
        # pool_period_min=40,
        # time_limit_per_job=120,
        # let us limit the number of concurrent experiments,
        # this in turn will make sure we do dont bombard the scheduler with experiments.
        # if we have an auto-scaler connected, this, by proxy, will limit the number of machine
        max_number_of_concurrent_tasks=opt_conf.hpo_params.max_number_of_concurrent_tasks,
        # set the maximum number of jobs to launch for the optimization, default (None) unlimited
        # If OptimizerBOHB is used, it defined the maximum budget in terms of full jobs
        # basically the cumulative number of iterations will not exceed total_max_jobs * max_iteration_per_job
        total_max_jobs=opt_conf.hpo_params.total_max_jobs,
        # optuna_pruner=pruner_dict.get(
        #     opt_conf.hpo_params.pruner
        # ),  # HyperbandPruner(min_resource=5, max_resource=80),
        # optuna_sampler=sampler_dict.get(opt_conf.hpo_params.sampler),

    )

and here the hpo_params used

hpo_params:
  objective-metric-title: ["HBT-KPI --- 2024-12-26 to 2025-01-12"]
  objective-metric-series: ["SR"]
  objective-metric-sign: ["max"]
  time-limit: 72000.0
  execution-queue: hpo_mmd
  min-iteration-per_job: 50
  max-iteration-per_job: 10000
  max-number-of-concurrent-tasks: 100
  total-max-jobs: 2000
  pruner: none
  sampler: none #random

Do you see any reason for the optimization finish before the total-max-jobs get reached?

I am not even sure if the issue is only that of not getting the metric. if that happens, I suppose the hpo should inject new parameters in the next iteration (since the max was not reached), but instead it stops running, and completes the optimization...

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					UpsetPanda50
				
					0
					 × 1

136 Views

0 Answers

7 months ago