Thank you for the link, I'll learn how to do it.
AgitatedDove41 , forgot to add the link for the docs, here it is:
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler/
AgitatedDove41 , you can run as many instances as you'd like 🙂
Please read this to see how it's done with AWS. I don't believe you need much DevOps knowledge 🙂
Yes, on AWS or on anything as long as it's not too hard for someone who doesn't know much DevOps stuff.
Ideally I want it to automatically spawn new compute instances when required and terminate the instances when not in use.
AgitatedDove41 Hi!
If I understand correctly you would like to run the training on AWS?
It can be also on ClearML. I'm just evaluating options currently. Is the free version suitable for running big experiment like that? (50-100 trials concurrent for maybe an hour and then stop)
My workflow is that I like to try a bunch of parameters (50-100 trials at a time) and then develop idea of what to do next.