Hi, Can I Run An

Answered

Hi, Can I Run An

Hi, can I run an ClearML Agent on multiple computers (on-premise)? Is there any example in the documentation on how to do that?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

Votes Newest

Answers 19

RoundMosquito25 you are absolutely correct !

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I assume, that even this is a thing that we would need:
https://clear.ml/docs/latest/docs/references/sdk/hpo_parameters_discreteparameterrange

But I would need to re-init this class when set of parameters, changes, right?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

SuccessfulKoala55 thank you for the response; what about the second part of question (stopping)?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

You just repeat the process on every machine you'd like

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

So for that we use

, we run Agents on n machines and we put those tasks to queues and tasks are run on n machines.
Do I understand that correctly?

Yes, indeed. You can have all of these agents listen to a single queue where the HPO will place all experiments

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m , but it will be enough to have, say 3 occurences when m<THRESHOLD and when it will happen, we stop the search (and free the resources, that can be needed for some further step)

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

SuccessfulKoala55 Thank you for the response! Let me elaborate a bit to check if I understand this correctly.
We have a time-consuming task T based on optimization for parameters. We want to run hyperparameter optimization for T, suppose that we want to run it for 100 sets of parameters.
We want to leverage the fact that we have n machines to make the work parallel.

So for that we use https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer/ , we run Agents on n machines and we put those tasks to queues and tasks are run on n machines.

Do I understand that correctly? Also another question - suppose, that we want to stop the search when some metric is satisfied (for example some loss value is smaller than THRESHOLD). Is there such option in ClearML?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

RoundMosquito25 actually you can 🙂
# check the state every minute while an_optimizer.wait(timeout=1.0): running_tasks = an_optimizer.get_active_experiments() for task in running_tasks: task.get_last_scalar_metrics() # do something herebase line reference
https://github.com/allegroai/clearml/blob/f5700728837188d7d6005726c581c9d74fd91164/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L127

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks! 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

Wow, that looks insteresting 🙂 Thank you, AgitatedDove14

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?

Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 in fact in our case we want to use simple strategies, RandomSearch is enough, but the problem is that we need to change the ranges dynamically

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

AgitatedDove14 shouldn't it be
while not an_optimizer.wait(timeout=1.0):instead of
while an_optimizer.wait(timeout=1.0):in the first code block?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

See here on how to set up an agent: https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

RoundMosquito25 this is a good point, I mean in theory it could be done, the question is the actual Bayesian optimization you are using.
Is it optuna (OptimizerOptuna) or OptimizerBOHB?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?

RoundMosquito25 you mean when you reach a limit of loss<Threshold or something similar ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

19 Answers

2 years ago