Thank you! I think it does. It’s just now dawning on me that: because a pipeline is composed of multiple tasks, different tasks in the pipeline could run on different machines. Or more specifically, they could run on different queues, and as you said, in your other response, we could have a Q for smaller CPU-based instances, and another queue larger GPU-based instances.
I like the idea of having a queue dedicated to CPU-based instances that has multiple agents running on it simultaneously. Like maybe four agents. Those agents could be used for more I/O-intensive tasks, such as writing results to our data warehouse. I think that would be a good used case for having a single resource handle multiple tasks concurrently.
Thanks for discussing this so thoroughly with me!
I will be starting with the AWS auto scaler script in the ClearML examples in GitHub. Do you happen to know if using that script? There is a straightforward way to provide a user-data.sh script? I imagine that’s how we would do things like fetching secrets from AWS’s secrets manager and starting the concurrent agents