Hi LudicrousParrot69
A bit of background:
A Task is a job executed in the system (sometime it is an experiment training, sometime a controller like the pipeline). Basically everything process can be a task.
Specifically the pipeline controller itself (i.e. the process running the Bayesian optimization) is Task in the system (i.e. a job running). What it does (using the HyperParameterOptimizer) is cloning previously executed Tasks (e.g. training experiments), change their parameters and monitor their results. All the Tasks in system are monitored and can be queried from anywhere.
You can see how to clone and launch Tasks manually here:
https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py
The end goal of these questions is how to programatically go from the task name for the latest run of the HP optimisation controlling task, get the task for the best experiment underneath it, and access its model and then serve it using some external tools
If you have the optimizer object you can do:best_task_objects = an_optimizer.get_top_experiments(top_k=3)
If you have the specific Task ID:task = Task.get_task(task_id='task_id_here')
Thanks Martin, this is super useful. Using the get_top_experiments would be great, but do I actually have access to the controller (an_optimizer) from the Task object itself? I dont see anything like an_optimizer = task.connect(an_optimizer)
which seems to be the normal way of connecthing things up?
LudicrousParrot69 you mean post execution or while you are executing the hyperparameter optimizer ?
Yeah, post execution in a separate script / task
I see, give me a minute to check what would be the easiest
Fantastic. Essentially the example provide just prints out ids to the log file, and Im trying to play around with better things to do so that the top models and similar are saved out in some way I can access without manually reading a log file. Maybe reporting a scalar thats a string which has the task id for the top model? Unsure the best way, hence why I was trying to access the optimiser itself which would naturally contain that info
Essentially the example provide just prints out ids to the log file,
What do mean?
the top models in the example arent saved out in a useful way, just printed out. Im trying to figure out th ebest way of saving these IDs so I can get the tasks/models
I see what you mean.an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=[], objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
You can run this code from anywhere. The 'base_task_id' is actually the pipeline controller Task ID.
BTW: Next version will have a nicer interface to query it, but this code will work on the current version
Ah I see, it does print out the top experiments, you jus thave to make sure the metric and what not agrees. If I was looking to just attach some basic information to the task (after its been rerun, instead of printing it to the log), would the best option be to use the Logger to try and attach it, or set parameters, set comment, or is there a general way to set some metadata that is intended to be used in that capacity.
The easiest would be as an artifact (I think).
Let's assume you put it into a csv file (with pandas or mnaually)
To upload (from the pipeline Task itself):task.upload_artifacts(name='summary', artifact_object='~/my/summary.csv')
Then if you want to grab it from anywhere else:task = Task.get_task(task_id='HPO controller Task id here') my_csv = Task.artifacts['summary'].get_local_copy()
If you want to store as dict it might be even easier:task.upload_artifacts(name='summary', artifact_object=a_dict_here)
Then you can:task = Task.get_task(task_id='HPO controller Task id here') my_dict = Task.artifacts['summary'].get()
Ah okay.Probably better than the Logger.report_text I was going to use to dump some json into, but I see a dict gets stores as json in upload_artifact as well. Perfect!
Hey AgitatedDove14 , another question if I can! Im trying to access this information from the API so I can put it as an artifact as well. Currently this is quite a few lines of code using get_top_experiments and get_last_scalar_metrics()[“evaluate”][“mae”][“last”], again I feel like Im missing something as I assume theres a far simpler way of getting data displayed so easily in the UI 🙂
LudicrousParrot69 ,
Are you trying to post execution parse the attached Table, then put it into a CSV on the HPO Task ?
Im trying to do it at the end of the optimisation. The same place in the example where you print the ids to the log. Im just hoping theres a way to get said table simply rather than going through a bunch of api calls to construct it myself
But ideally yes, the HPO should have a df artifact summarising the HPO itself so I can try and make use of the information properly 🙂
Bad news, there isn't a nice interface to get the table from the Optimizer object (I will make sure we add it, no reason not to).
But you can very easily get all the information you need and more:all_the_tasks = an_optimizer.get_top_experiments(top_k=100)
Then for every task in the list you can get All the information:for task in all_the_tasks: task_params_as_dict = task.get_parameters() task_scalars = task.get_last_scalar_metrics()
Basically the Task object enables you to query any Task in the system, we just get the list of Task from the optimizer (sorted by the optimization objective, then we can do whatever we need with it)
Yup, thats how Ive been doing it now. Will happily update to a simpler method call whenever one gets made. Trying to make use of the HPO is a big thing Im trying to sell the team on, as its what sets ClearML apart from MLFlow or neptune - useful task orchestration and cloning 🙂
Working on it as we speak 🙂 Hopefully in the next release (probably next week)
Awesome. If I end up convincing the team to use ClearML, Ill probably have a ton of small requests to streamline the automation side of orchestration, is the team open to PRs from external people?
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do 🙂