But from your other answer, I think I'm understanding that you
can
have multiple agents on a single instance listening to the same queue.
Correct
So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.
Correct (that said I do not understand how come a single Task does not utilize the CPU, I was under the impression it is running a mode, see details below)
By my understanding, a worker (which I assumed was an entire EC2 instance)
Basically the assumption is that you are able to maximize the CPU/GPU on that instance (the specific DL/ML component), the other you can run on other instances. The EC2 instances will not be shutdown when they are done with a Single Task, but only after they are idle for X minutes.
Pipeline Logic (as opposed to pipeline component) is running on the "services" agent machine, which is running multiple pipelines at the same time. The component itself is running on another machine (i.e. the pipeline logic launches it), and the actual compute is done on that machine. the AWS autoscaler can limit the number of concurrent "compute EC2" and these are just running the "inference" itself.
Does that make sense ?