Hi, I'M Trying To Get An Understanding Of How

Answered

Hi, I'm trying to get an understanding of how Pipeline Controller works since there's currently no documentation for it as of now and it hasn't been released yet. I've implemented my code according to the structure available here: https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py , but wanted to clarify a few things:
I need to execute the individual steps of the pipeline(done via execute_remotely) before running the file which contains the pipeline controller. However- after doing this, my experiment for the controller doesn't complete and the next step in the pipeline remains 'pending' indefinitely, while the controller is still 'running'. Do I need to install and configure trains-agent in order to execute the pipeline properly? Side note- All of this is being done through local deployment on MacOS.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					GiddyTurkey39
				
					0
					 × 1

Votes Newest

Answers 5

Hi GiddyTurkey39
Glad to see that you are already diving into the controllers (the stable release will be out early next week)
A bit of background on how the pipeline controller are designed:
All steps in the pipeline are experiments already registered in the system (i.e. you can see them in the UI). Regardless on how you created those experiments they have to be there prior to the pipeline launch. The pipeline itself can be executed on any machine (it does very little, and consumes almost no cpu), but the idea is to have it executed in the "services" queue so you do not have to have your machine up and running all the time. All steps the pipeline creates, are assumed to be executed using the trains-agent (i.e. experiments are cloned adjusted and enqueued into an execution queue).

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 thanks, I'm new to Allegro here so just trying to figure everything out. When you say trains agent, are you referring to the trains agent command(so in this case , would it be trains-agent execute ?). Is it sufficient to queue the experiments(using execute_remotely ) or do I need to clone them as well?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					GiddyTurkey39
				
					0
					 × 1

Hi GiddyTurkey39 ,

When you say trains agent, are you referring to the trains agent command ...

I mean running the trains-agent daemon on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/

Is it sufficient to queue the experiments

Yes there is no need for additional "cloning". Obviously if you want to re-run the experiment, you can clone it and enqueue it again.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Thanks- just got the pipeline to run 🙂 Just one final question- on the documentation, it says not to queue any training/inference tasks into the services queue, so should I be creating a different queue for training tasks or is using the default queue okay?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					GiddyTurkey39
				
					0
					 × 1

just got the pipeline to run

Nice!

using the default queue okay?

Using the default queue is fine. The different queue is the "services" queue that by default the "trains-server" is running an agent the will pull jobs from there.
With "services" mode, an agent will pull jobs right after the other (not waiting for the previous job to finish), as opposed to regular queue (any other) that the trains-agent will pull a job only after the previous one completed .

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

989 Views

5 Answers

4 years ago

one year ago