Hey AgitatedDove14 , another question if I can! Im trying to access this information from the API so I can put it as an artifact as well. Currently this is quite a few lines of code using get_top_experiments and get_last_scalar_metrics()[“evaluate”][“mae”][“last”], again I feel like Im missing something as I assume theres a far simpler way of getting data displayed so easily in the UI 🙂
Im trying to do it at the end of the optimisation. The same place in the example where you print the ids to the log. Im just hoping theres a way to get said table simply rather than going through a bunch of api calls to construct it myself
I see what you mean.
an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=, objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
the top models in the example arent saved out in a useful way, just printed out. Im trying to figure out th ebest way of saving these IDs so I can get the tasks/models
Yup, thats how Ive been doing it now. Will happily update to a simpler method call whenever one gets made. Trying to make use of the HPO is a big thing Im trying to sell the team on, as its what sets ClearML apart from MLFlow or neptune - useful task orchestration and cloning 🙂
Bad news, there isn't a nice interface to get the table from the Optimizer object (I will make sure we add it, no reason not to).
But you can very easily get all the information you need and more:
all_the_tasks = an_optimizer.get_top_experiments(top_k=100)Then for every task in the list you can get All the information:
for task in all_the_tasks: task_params_as_dict = task.get_parameters() task_scalars = task.get_last_scalar_metrics()Basically the Task object enables you to query any Task in the system, we just get the list of Task from the optimizer (sorted by the optimization objective, then we can do whatever we need with it)
Thanks Martin, this is super useful. Using the get_top_experiments would be great, but do I actually have access to the controller (an_optimizer) from the Task object itself? I dont see anything like
an_optimizer = task.connect(an_optimizer) which seems to be the normal way of connecthing things up?
The easiest would be as an artifact (I think).
Let's assume you put it into a csv file (with pandas or mnaually)
To upload (from the pipeline Task itself):
task.upload_artifacts(name='summary', artifact_object='~/my/summary.csv')Then if you want to grab it from anywhere else:
task = Task.get_task(task_id='HPO controller Task id here') my_csv = Task.artifacts['summary'].get_local_copy()
If you want to store as dict it might be even easier:
task.upload_artifacts(name='summary', artifact_object=a_dict_here)Then you can:
task = Task.get_task(task_id='HPO controller Task id here') my_dict = Task.artifacts['summary'].get()
A bit of background:
A Task is a job executed in the system (sometime it is an experiment training, sometime a controller like the pipeline). Basically everything process can be a task.
Specifically the pipeline controller itself (i.e. the process running the Bayesian optimization) is Task in the system (i.e. a job running). What it does (using the HyperParameterOptimizer) is cloning previously executed Tasks (e.g. training experiments), change their parameters and monitor their results. All the Tasks in system are monitored and can be queried from anywhere.
You can see how to clone and launch Tasks manually here:
The end goal of these questions is how to programatically go from the task name for the latest run of the HP optimisation controlling task, get the task for the best experiment underneath it, and access its model and then serve it using some external tools
If you have the optimizer object you can do:
best_task_objects = an_optimizer.get_top_experiments(top_k=3)If you have the specific Task ID:
task = Task.get_task(task_id='task_id_here')
Ah I see, it does print out the top experiments, you jus thave to make sure the metric and what not agrees. If I was looking to just attach some basic information to the task (after its been rerun, instead of printing it to the log), would the best option be to use the Logger to try and attach it, or set parameters, set comment, or is there a general way to set some metadata that is intended to be used in that capacity.
Fantastic. Essentially the example provide just prints out ids to the log file, and Im trying to play around with better things to do so that the top models and similar are saved out in some way I can access without manually reading a log file. Maybe reporting a scalar thats a string which has the task id for the top model? Unsure the best way, hence why I was trying to access the optimiser itself which would naturally contain that info