Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Can I Run An

Hi, can I run an ClearML Agent on multiple computers (on-premise)? Is there any example in the documentation on how to do that?

  
  
Posted one year ago
Votes Newest

Answers 19


AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m , but it will be enough to have, say 3 occurences when m<THRESHOLD and when it will happen, we stop the search (and free the resources, that can be needed for some further step)

  
  
Posted one year ago

So for that we use

, we run Agents on n machines and we put those tasks to queues and tasks are run on n machines.
Do I understand that correctly?

Yes, indeed. You can have all of these agents listen to a single queue where the HPO will place all experiments

  
  
Posted one year ago

I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?

RoundMosquito25 you mean when you reach a limit of loss<Threshold or something similar ?

  
  
Posted one year ago

AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]

  
  
Posted one year ago

RoundMosquito25 this is a good point, I mean in theory it could be done, the question is the actual Bayesian optimization you are using.
Is it optuna (OptimizerOptuna) or OptimizerBOHB?

  
  
Posted one year ago

Wow, that looks insteresting 🙂 Thank you, AgitatedDove14

  
  
Posted one year ago

AgitatedDove14 in fact in our case we want to use simple strategies, RandomSearch is enough, but the problem is that we need to change the ranges dynamically

  
  
Posted one year ago

I assume, that even this is a thing that we would need:
https://clear.ml/docs/latest/docs/references/sdk/hpo_parameters_discreteparameterrange

But I would need to re-init this class when set of parameters, changes, right?

  
  
Posted one year ago

Thanks! 🙂

  
  
Posted one year ago

RoundMosquito25 you are absolutely correct !

  
  
Posted one year ago

Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?

  
  
Posted one year ago

RoundMosquito25 actually you can 🙂
# check the state every minute while an_optimizer.wait(timeout=1.0): running_tasks = an_optimizer.get_active_experiments() for task in running_tasks: task.get_last_scalar_metrics() # do something herebase line reference
https://github.com/allegroai/clearml/blob/f5700728837188d7d6005726c581c9d74fd91164/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L127

  
  
Posted one year ago

SuccessfulKoala55 thank you for the response; what about the second part of question (stopping)?

  
  
Posted one year ago

SuccessfulKoala55 Thank you for the response! Let me elaborate a bit to check if I understand this correctly.
We have a time-consuming task T based on optimization for parameters. We want to run hyperparameter optimization for T, suppose that we want to run it for 100 sets of parameters.
We want to leverage the fact that we have n machines to make the work parallel.

So for that we use https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer/ , we run Agents on n machines and we put those tasks to queues and tasks are run on n machines.

Do I understand that correctly? Also another question - suppose, that we want to stop the search when some metric is satisfied (for example some loss value is smaller than THRESHOLD). Is there such option in ClearML?

  
  
Posted one year ago

In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?

Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043

  
  
Posted one year ago

AgitatedDove14 shouldn't it be
while not an_optimizer.wait(timeout=1.0):instead of
while an_optimizer.wait(timeout=1.0):in the first code block?

  
  
Posted one year ago

In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?

  
  
Posted one year ago

You just repeat the process on every machine you'd like

  
  
Posted one year ago
610 Views
19 Answers
one year ago
one year ago
Tags
Similar posts