Hpo With Optuna Via Webapp (Pro Version) Is Not Working As Expected. I Generated A Dummy Task That Logs A Random Value Into Some Metric And Closes, The Code:

Answered

HPO with optuna via webapp (pro version) is not working as expected.
i generated a dummy task that logs a random value into some metric and closes, the code:



@hydra.main(config_path=".", config_name=MAIN_CONFIG_FILE, version_base=None)
def main(cfg: DictConfig):
    """
    Main script to set up and execute the training pipeline.

    Parameters
    ----------
    cfg: DictConfig
        A dictionary containing the configurations from main config and sub-configs from configs directory.
    """
    master_config = OmegaConf.to_container(cfg, resolve=True)
    task, clearml_logger = initialize_clearml_task(**master_config.pop("clearml"))  # returns the task object and its logger object
    for i in range(100):
        clearml_logger.report_scalar("Val/Metrics", "AUC", torch.randint(0, 100, (1,)).item(), i)
    task.flush(wait_for_uploads=True)
    task.close()

if __name__ == "__main__":
    main()

i then run HPO using optuna via the webapp and getting this errors (in multiple threads):

Exception in thread Thread-2 (_daemon):
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.11/site-packages/clearml/automation/optimization.py", line 1923, in _daemon
    self.optimizer.start()
  File "/usr/local/lib/python3.11/site-packages/clearml/automation/optuna/optuna.py", line 198, in start
    self._study.optimize(
  File "/usr/local/lib/python3.11/site-packages/optuna/study/study.py", line 451, in optimize
    _optimize(
  File "/usr/local/lib/python3.11/site-packages/optuna/study/_optimize.py", line 99, in _optimize
    f.result()
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/optuna/study/_optimize.py", line 159, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/optuna/study/_optimize.py", line 247, in _run_trial
    raise func_err
  File "/usr/local/lib/python3.11/site-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/clearml/automation/optuna/optuna.py", line 93, in objective
    iteration_value = iteration_value[0]
                      ~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable
`Study.stop` is supposed to be invoked inside an objective function or a callback.

which makes the HPO task to fail/abort, but for some reason as completed successfully task (would expect a fail or at least aborted here...).
the hpo manages to run a few experiments before it stops sending new ones due to the raised error, and metrics values are reported and are visible via the app UI.
full HPO task log and template task log are attached. some of the tasks that were generated are aborted (but completed the entire reporting iterations) and some are completed (as expected).

would love to get some help here, i noticed that many users encounters this issue, but didn't find any solutions in this channel.

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					DangerousBee35
				
					0
					 × 1

Votes Newest

Answers 2

Hi @<1594863230964994048:profile|DangerousBee35> , can you try with the latest clearml version? can you share initialize_clearml_task function?

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					RoundElephant20
				
					0

Hi, sure, there is nothing special there, even some redundancy.

def initialize_clearml_task(
        project_name: str = None,
        task_name: str = None,
        task_type: str = None,
        tags: list[str] = None,
) -> tuple[Task, Logger]:
    """
    Initialize and configure a ClearML task.

    Parameters
    ----------
    project_name : str
        Name of the ClearML project.
    task_name : str
        Name of the ClearML task.
    task_type : str
        Type of the ClearML task.
    tags : list[str]
        List of tags to be assigned to the task.

    Returns
    -------
    tuple[Task, Logger]
        A tuple containing the ClearML task, and the logger.
    """
    task = Task.current_task()
    if task is None:
        task = CADLUtils.init_clearml_task(
            project_name=project_name,
            task_name=task_name,
            task_type=task_type,
            tags=tags
        )
    logger = task.get_logger()

    return task, logger

class CADLUtils:
    @staticmethod
    def init_clearml_task(project_name: str, task_name: str, task_type: str, tags: list[str] | None = None) -> Task:
        """
        Initializes a ClearML task for the current project.

        Parameters:
        -----------
        project_name : str
            The name of the project. for nested projects, use the format 'parent_project/child_project'.
        task_name : str
            The name of the task.
        task_type : str
            The type of the task. choose from clearml.Task.TaskTypes.

        Returns:
        --------
        Task
            The initialized ClearML task.

        Example:
        --------
        task = CADLUtils.init_clearml_task(project_name='my_project', task_name='my_task', task_type='training')
        """
        task = Task.init(
            project_name=project_name,
            task_name=task_name,
            task_type=task_type,
            tags=tags,
            auto_connect_frameworks={"pytorch": False},
            reuse_last_task_id=False,
            # output_uri="

",
        )
        return task

will try with the newest version as well

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					DangerousBee35
				
					0
					 × 1

Write your answer

1K Views

2 Answers

7 months ago