Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have A Question Regarding Reducing Execution Time Of Pulling Results From The Server With The Python Api. As Part Of Some Pipeline, After Running Hpo I Am Pulling All The Results From My Optimizer Task And Also Pulling All The Scalars Associated With Th

I have a question regarding reducing execution time of pulling results from the server with the python API.
As part of some pipeline, after running HPO I am pulling all the results from my optimizer task and also pulling all the scalars associated with that. and it is very very slow.. ~30 min for 1000 experiments
I am running something like:
top_tasks = an_optimizer.get_top_experiments(n_exp) task_scalars = dict() for task in top_tasks: task_scalars[task.id] = task.get_last_scalar_metrics()Is there a way to make it faster?

  
  
Posted 2 years ago
Votes Newest

Answers 24


DepressedChimpanzee34 , Hi!

The part you want to do faster is the code snippet you provided? Also, I'll check regarding the verbosity 🙂

  
  
Posted 2 years ago

kind of on the same topic, it would be very useful if some kind of verbosity will be enabled.. some kind of progress bar for get_top_experiments()

  
  
Posted 2 years ago

DepressedChimpanzee34 something along the lines of:
from multiprocessing.pool import ThreadPool p = ThreadPool() def get_last_metric(t): return t.get_last_scalar_metrics() task_scalars_list = p.map(get_last_metric, top_tasks) p.close()We parallelized network connection as I'm assuming the delay is fetching

  
  
Posted 2 years ago

this?
ids = [t.id for t in top_task]

  
  
Posted 2 years ago

for me at the moment it means "manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of the optimizer object..
the optimizer_task seem to have an attribute called hyper_parameters but its empty in my case..

  
  
Posted 2 years ago

it seem to be orders of magnitude faster!

  
  
Posted 2 years ago

optimizer.get_top_experiments(n)

  
  
Posted 2 years ago

I pull all the parameters, and then manually filter on the HP keys (manually=I have to plug them in, they are not part of optimizer object)

So is this an improvement to optimizer._get_child_tasks_ids(...) interface ?
e.g. return a structure like:
[ { 'id': task_id, 'hp1': value, 'hp2': value, 'hp3': value, 'objective': dict(title='title', series='series', value=42 }, ]

  
  
Posted 2 years ago

I mean to get top_tasks

  
  
Posted 2 years ago

AgitatedDove14 , I am referring to some generic HPO scenario where you define some HP space lets say:
param1 = np.linspace(lower_bound, upper_bound, n) param2 = np.linspace(lower_bound, upper_bound, n)then you run an optimization that samples this HP space,
For each trial a sample is pulled from the space, some experiment is performed and you get a score. Then to analyze the behavior of your objective you want to understand the relation between the params and objective score.
Then if you pull the trials metrics, you most likely want to know to which HP they belong.
So the bottom line is that when pulling results you are interested in the metrics values + HP point (param1=values, param2=values, ...) of the trial

  
  
Posted 2 years ago

I have a small question about the response structure, each of the metrics has this structure:
metric_id: { ... "value": 0.0006447011, "min_value": 8.6326945e-06, "max_value": 0.001049518, ... } what does value refer to? the last reported?

  
  
Posted 2 years ago

AgitatedDove14 , for creating a dedicated function I would suggest also including the actual sampled point in the HP space. This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters

  
  
Posted 2 years ago

thanks, I'll try this. Is there an efficient way to get the IDs first?

  
  
Posted 2 years ago

You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)

  
  
Posted 2 years ago

or creating a dedicated function I would suggest also including the actual sampled point in the HP space.

Could you expand ?

This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters

Does this relates to:
https://github.com/allegroai/clearml/issues/430

manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of the optimizer object..

what do you mean?

  
  
Posted 2 years ago

AgitatedDove14 , what I meant by manually filtering, at the moment, to combine the information of metric values + HP point, I pull all the parameters, and then manually filter on the HP keys (manually=I have to plug them in, they are not part of optimizer object)

  
  
Posted 2 years ago

AgitatedDove14 , definitely so, this is very generic and very useful
In many cases the objective is just one of multiple metrics of interest, so for me almost always I would want to combine it with the rest of the scalar metrics

  
  
Posted 2 years ago

that is the heaviest part for me

  
  
Posted 2 years ago

Sounds good to me. DepressedChimpanzee34 any chance you can add a github feature request, so we do not forget to add it?

  
  
Posted 2 years ago

AgitatedDove14 , the issue you mention does not relate to this discussion

  
  
Posted 2 years ago

Hmm check if this one works:
optimizer._get_child_tasks_ids( parent_task_id=optimizer._job_parent_id or optimizer._base_task_id, order_by=optimizer._objective_metric._get_last_metrics_encode_field(), additional_filters={'page_size': int(top_k), 'page': 0})If it does, let's PR it as a dedicated function

  
  
Posted 2 years ago

You can try direct API call for all the Tasks together:
Task._query_tasks(task_ids=[IDS here], only_fields=['last_metrics'])

  
  
Posted 2 years ago

AgitatedDove14 thanks, I actually experimented with similar parallel pool approach but the overhead seem to even out the benefit..
is there something you can think of for the first part though? pulling all the experiments get_top_experiments()

  
  
Posted 2 years ago
559 Views
24 Answers
2 years ago
one year ago
Tags