Hi GiddyTurkey39
Glad to see that you are already diving into the controllers (the stable release will be out early next week)
A bit of background on how the pipeline controller are designed:
All steps in the pipeline are experiments already registered in the system (i.e. you can see them in the UI). Regardless on how you created those experiments they have to be there prior to the pipeline launch. The pipeline itself can be executed on any machine (it does very little, and consumes almost no cpu), but the idea is to have it executed in the "services" queue so you do not have to have your machine up and running all the time. All steps the pipeline creates, are assumed to be executed using the trains-agent (i.e. experiments are cloned adjusted and enqueued into an execution queue).
AgitatedDove14 thanks, I'm new to Allegro here so just trying to figure everything out. When you say trains agent, are you referring to the trains agent command(so in this case , would it be trains-agent execute
?). Is it sufficient to queue the experiments(using execute_remotely
) or do I need to clone them as well?
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon
on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no need for additional "cloning". Obviously if you want to re-run the experiment, you can clone it and enqueue it again.
AgitatedDove14 Thanks- just got the pipeline to run 🙂 Just one final question- on the documentation, it says not to queue any training/inference tasks into the services queue, so should I be creating a different queue for training tasks or is using the default queue okay?
just got the pipeline to runÂ
Nice!
using the default queue okay?
Using the default queue is fine. The different queue is the "services" queue that by default the "trains-server" is running an agent the will pull jobs from there.
With "services" mode, an agent will pull jobs right after the other (not waiting for the previous job to finish), as opposed to regular queue (any other) that the trains-agent will pull a job only after the previous one completed .