It can be also on ClearML. I'm just evaluating options currently. Is the free version suitable for running big experiment like that? (50-100 trials concurrent for maybe an hour and then stop)
My workflow is that I like to try a bunch of parameters (50-100 trials at a time) and then develop idea of what to do next.
Thank you for the link, I'll learn how to do it.
AgitatedDove41 , forgot to add the link for the docs, here it is:
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler/
AgitatedDove41 , you can run as many instances as you'd like 🙂
Please read this to see how it's done with AWS. I don't believe you need much DevOps knowledge 🙂
Yes, on AWS or on anything as long as it's not too hard for someone who doesn't know much DevOps stuff.
Ideally I want it to automatically spawn new compute instances when required and terminate the instances when not in use.
AgitatedDove41 Hi!
If I understand correctly you would like to run the training on AWS?