Hi, In My Setup I Run Multiple Experiments In Parallel From The Same Script. I Understand That There Can Only Be One Execution

Answered

Hi,
In my setup I run multiple experiments in parallel from the same script. I understand that there can only be one execution Task in a script. I would like trains to log each of those experiments separately. How can I do that when I can only initialize Task just once?
Thanks,

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

Votes Newest

Answers 15

Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ok, cool. Thanks. This clears up things. I need to read more about the trains agent then. I have another question, I'll post it as a separate thread.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then

trains-agent

pulls them from the queue and starts executing them. You can have multiple

trains-agent

on as many machines as you like with specific GPUs etc. each one will pull a single experiment and execute it, once it is done it will pull the next one etc.

Oh ok! So if I have the base experiment say 'mnist1' and I run HPO which executes 10 experiments. Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

For HPO (hyper-param opt), are all experiments which are part of the optimization process logged? I understand the HPO process takes a base experiment and runs subsequent experiments with the new HPs. Are these experiments logged too (with the train-valid curves, etc)?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

Mostly they are a set of user defined hyper-parameters. I've been reading about hyper-param optimization since posting this. It seems like I would have to use hyper-param opt to achieve that.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

Great, yes that makes sense.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init and when you are done call Task.close . This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent which could help you spin those experiments on as many machines as you like 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Or a nicer one here:
https://demoapp.trains.allegro.ai/projects/97f6b5b53a0243c196d6f49c221cbdca/compare-experiments;ids=cdc2cc156ae042f08dab2b66756f468a,bb76b70520e046ebbcc21613926e7316,189a495824544718b4c271ce9575f32c/hyper-params/graph?hyper-params=graph

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Obviously if you click on them you will be able to compare based on specific metric / parameters (either as table or in parallel coordinates)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Are these experiments logged too (with the train-valid curves, etc)?

Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then trains-agent pulls them from the queue and starts executing them. You can have multiple trains-agent on as many machines as you like with specific GPUs etc. each one will pull a single experiment and execute it, once it is done it will pull the next one etc.

SourSwallow36 how are you thinking of running those HP tests?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?

Yes (they will have the specific HP name/value combination).
FYI names are not unique so in theory you could have multiple experiments with the same name.

If you look under the Configuration Tab, you will find all the configuration arguments for the experiment. You can also add specific arguments to the experiment table (click the cogwheel at the right top corner, and select +hyper-parameters)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

how are you thinking of running those HP tests?

I'm not sure if I completely understand the question. Here is what I do presently. This maybe achieved more efficiently in trains (that's why I'm trying to move to trains).

Example:
I have a set of 10 user defined HPs. I have a scheduler that runs them independently in parallel. Once the training is complete, I run inference on the test set for these experiments. The data for both training and inference is logged under the respective experiment (which are 10 in this case).

So I'm trying to emulate this process in trains.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SourSwallow36
				
					0
					 × 1

See example here:
https://demoapp.trains.allegro.ai/projects/97f6b5b53a0243c196d6f49c221cbdca/compare-experiments;ids=cdc2cc156ae042f08dab2b66756f468a,0aa6737817d0408ba22090a8cb076fdd/hyper-params/graph?hyper-params=graph

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is an example of hoe one can clone an experiment and change it from code:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py

A full HPO optimization process (basically the same idea only with optimization algorithms deciding on the next set of parameters) is also available:
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

SourSwallow36 okay, let's assume we have the base experiment (the original one before the HP process).
What we do is we clone that experiment (either in UI or with code or with code automation, aka HP optimizer. Then each clone of the original gets a set of new HP, then we enqueue the 10 experiments into the execution queue. In parallel, we run trains-agent on a machine, and connect it to the queue. It will pull the experiments, one after the other, run them and log their results. We will end with 10 "completed" experiments.
Make sense?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

15 Answers

5 years ago

2 years ago