Quick Question On

Answered

Quick Question On

Quick question on trains-agent and HPO. Say I have 10 experiments enqueued to a trains-agent . I understand the agent runs the experiment one-by-one. But can that be parallelized? Or I need to have multiple trains-agent s each with their own queues which will be running in parallel one experiment at a time?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Votes Newest

Answers 8

You will need to habe multiple

trains-agent

s but they will be sharing the same queue (i.e. pulling jobs from the same queue the HPO process is pushing to)
Make sense ?

Hmm. So say I have a parameter NUM_PARALLEL_EXECUTIONS , I can programmatically launch that many trains-agent for every optimization run?!

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Got it. I haven't tried setting up trains-agent yet so I don't know much about the overhead of launching the agent. I'd imagine if it has to create the full environment (installing requirements, etc), the overhead might not be that low. But as I'm reading, it looks like I can use a docker image with the full env. Is my understanding correct?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Hi SarcasticSparrow10
You will need to habe multiple trains-agent s but they will be sharing the same queue (i.e. pulling jobs from the same queue the HPO process is pushing to)
Make sense ?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)

Basically the same restriction as manually launching two processes using the same GPU

That makes sense. Currently, I use python multiprocessing to launch multiple experiments on the sam GPU device. I'm guessing using trains-agent will be similar

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Basically it solves the remote-execution problem, so you can scale to multiple machines relatively easy :)

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Correct 🙂
You can spin it in two modes, either venv or docker (notice that even in docker mode, it will still clone the code into the docker and install the packages inside the docker, but it also inherits from the docker preinstalled system packages, so that the installation process is a lot faster, but you have the ability to change packages without having to build an entire new docker image)

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Great, thanks!

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)

Basically the same restriction as manually launching two processes using the same GPU

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

8 Answers

4 years ago

2 years ago