Its built in 🙂 and Its for... "Services"
https://github.com/allegroai/trains-server#trains-agent-services--
Yeah but I don't get what it is for - for now I have 2 agents, each listening to some queues. I actually ignore the "services" queue until now
I don't get the difference between how I'm using my agents now, just starting them on machines, and making them listen to queues, to using the "services" mode
WackyRabbit7
regular trains-agent modus operandi is one job at a time (i.e. until the Task is done, no other Tasks will be pulled from the queue).
When adding --services-mode, it is Not 1-1 but 1-N, meaning a single trains-agent will launch as many Tasks as it can.
The trains-agent pulls a job from the queue and spins a docker (only dockers are supported for the time being) and lets the job run in the background (the job itself will be registered as another "worker" in the system). Then the trains-agent will pull the next job from the queue.
Sorry.. I still don't get it - when I'm launching an agent with the --docker
flag or with the --services-mode
flag, what is the difference? Can I use both flags? what does it mean? 🤔
does the services mode have a separate configuration for base image?
Oh I get it, that also makes sense with the docs directing this at inference jobs and avoiding GPU - because of the 1-N thing
WackyRabbit7 It is conceptually different than actually training, etc.
The service agent is mostly one without a gpu, runs several tasks each on their own container, for example: autoscaler, the orchestrators for our hyperparameter opt and/or pipelines. I think it even uses the same hardware (by default?) of the trains-server.
Also, if I'm not mistaken some people are using it (planning to?) to push models to production.
I wonder if anyone else can share their view since this is a relatively new feature (AHEM)
or its the same palce in the config file for configuring the docker mode agent base image?
It's just another flag when running the trains-agent
You can have multiple service-mode instances, there is no actual limit 🙂