RoundMosquito25 actually you can 🙂# check the state every minute while an_optimizer.wait(timeout=1.0): running_tasks = an_optimizer.get_active_experiments() for task in running_tasks: task.get_last_scalar_metrics() # do something here
base line reference
https://github.com/allegroai/clearml/blob/f5700728837188d7d6005726c581c9d74fd91164/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L127
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
AgitatedDove14 shouldn't it bewhile not an_optimizer.wait(timeout=1.0):
instead ofwhile an_optimizer.wait(timeout=1.0):
in the first code block?
See here on how to set up an agent: https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps
RoundMosquito25 you are absolutely correct !
SuccessfulKoala55 Thank you for the response! Let me elaborate a bit to check if I understand this correctly.
We have a time-consuming task T based on optimization for parameters. We want to run hyperparameter optimization for T, suppose that we want to run it for 100 sets of parameters.
We want to leverage the fact that we have n machines to make the work parallel.
So for that we use https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer/ , we run Agents on n machines and we put those tasks to queues and tasks are run on n machines.
Do I understand that correctly? Also another question - suppose, that we want to stop the search when some metric is satisfied (for example some loss value is smaller than THRESHOLD). Is there such option in ClearML?
AgitatedDove14 in fact in our case we want to use simple strategies, RandomSearch is enough, but the problem is that we need to change the ranges dynamically
Wow, that looks insteresting 🙂 Thank you, AgitatedDove14
You just repeat the process on every machine you'd like
Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
SuccessfulKoala55 thank you for the response; what about the second part of question (stopping)?
AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]
RoundMosquito25 this is a good point, I mean in theory it could be done, the question is the actual Bayesian optimization you are using.
Is it optuna (OptimizerOptuna) or OptimizerBOHB?
AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m
, but it will be enough to have, say 3 occurences when m<THRESHOLD
and when it will happen, we stop the search (and free the resources, that can be needed for some further step)
I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
RoundMosquito25 you mean when you reach a limit of loss<Threshold
or something similar ?
So for that we use
, we run Agents on n machines and we put those tasks to queues and tasks are run on n machines.
Do I understand that correctly?
Yes, indeed. You can have all of these agents listen to a single queue where the HPO will place all experiments
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
I assume, that even this is a thing that we would need:
https://clear.ml/docs/latest/docs/references/sdk/hpo_parameters_discreteparameterrange
But I would need to re-init this class when set of parameters, changes, right?