Ah ok ! I think the link between clearml agent queues and the autoscaler is also important for resource monitoring etc. Sorry I'm quite new to clearml so I'm still trying to understand the architecture 🙈 . Although AWS Batch has its own queue system too which it uses to allocate jobs on training instances but its not customizable
Hi TimelyPenguin76 ! quick question - I was curious if at any point in time ClearML had considered using AWS Batch for the autoscaling part ? As submitting "training jobs" to AWS batch would create/terminate ec2 instances automatically too instead of clearml writing logic to spin up/down instances
Currently we don’t have a GCP auto scalar. We’re more than happy to get contributions for GCP and other platforms.
The AWS auto-scaler code is pretty generic and in order to be used for GCP, you need to implement a
GCPAutoScaler similar to
trains.automation.aws_auto_scaler.AWSAutoScaler , which basically has