I pull all the parameters, and then manually filter on the HP keys (manually=I have to plug them in, they are not part of optimizer object)
So is this an improvement to optimizer._get_child_tasks_ids(...)
interface ?
e.g. return a structure like:[ { 'id': task_id, 'hp1': value, 'hp2': value, 'hp3': value, 'objective': dict(title='title', series='series', value=42 }, ]
You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)
Hmm check if this one works:optimizer._get_child_tasks_ids( parent_task_id=optimizer._job_parent_id or optimizer._base_task_id, order_by=optimizer._objective_metric._get_last_metrics_encode_field(), additional_filters={'page_size': int(top_k), 'page': 0})
If it does, let's PR it as a dedicated function
I have a small question about the response structure, each of the metrics has this structure:metric_id: { ... "value": 0.0006447011, "min_value": 8.6326945e-06, "max_value": 0.001049518, ... }
what does value refer to? the last reported?
AgitatedDove14 , I am referring to some generic HPO scenario where you define some HP space lets say:param1 = np.linspace(lower_bound, upper_bound, n) param2 = np.linspace(lower_bound, upper_bound, n)
then you run an optimization that samples this HP space,
For each trial a sample is pulled from the space, some experiment is performed and you get a score. Then to analyze the behavior of your objective you want to understand the relation between the params and objective score.
Then if you pull the trials metrics, you most likely want to know to which HP they belong.
So the bottom line is that when pulling results you are interested in the metrics values + HP point (param1=values, param2=values, ...) of the trial
AgitatedDove14 , what I meant by manually filtering, at the moment, to combine the information of metric values + HP point, I pull all the parameters, and then manually filter on the HP keys (manually=I have to plug them in, they are not part of optimizer object)
it seem to be orders of magnitude faster!
AgitatedDove14 , the issue you mention does not relate to this discussion
or creating a dedicated function I would suggest also including the actual sampled point in the HP space.
Could you expand ?
This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters
Does this relates to:
https://github.com/allegroai/clearml/issues/430
manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of the optimizer object..
what do you mean?
AgitatedDove14 thanks, I actually experimented with similar parallel pool approach but the overhead seem to even out the benefit..
is there something you can think of for the first part though? pulling all the experiments get_top_experiments()
AgitatedDove14 , definitely so, this is very generic and very useful
In many cases the objective is just one of multiple metrics of interest, so for me almost always I would want to combine it with the rest of the scalar metrics
You can try direct API call for all the Tasks together:Task._query_tasks(task_ids=[IDS here], only_fields=['last_metrics'])
for me at the moment it means "manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of the optimizer object..
the optimizer_task seem to have an attribute called hyper_parameters but its empty in my case..
Sounds good to me. DepressedChimpanzee34 any chance you can add a github feature request, so we do not forget to add it?
AgitatedDove14 , done
https://github.com/allegroai/clearml/issues/473
thanks, I'll try this. Is there an efficient way to get the IDs first?
AgitatedDove14 , for creating a dedicated function I would suggest also including the actual sampled point in the HP space. This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters
DepressedChimpanzee34 something along the lines of:from multiprocessing.pool import ThreadPool p = ThreadPool() def get_last_metric(t): return t.get_last_scalar_metrics() task_scalars_list = p.map(get_last_metric, top_tasks) p.close()
We parallelized network connection as I'm assuming the delay is fetching
DepressedChimpanzee34 , Hi!
The part you want to do faster is the code snippet you provided? Also, I'll check regarding the verbosity 🙂
kind of on the same topic, it would be very useful if some kind of verbosity will be enabled.. some kind of progress bar for get_top_experiments()