
Reputation
Badges 1
5 × Eureka!All good! Found the documentation
BTW my first test worked great on a CPU instance. Might have been a temporary issue.
So I've gotten a little further by tweaking my VPC setup. I see that the autoscaler spins up new instances w/o a public ip so that means specifying a private subnet and having a NAT gateway on the subnet right? I tried running the scikit-learn joblib sample on a CPU instance, and the experiment definitely ran but I see another error. Could there be some permissions missing somewhere?
2023-04-03 15:01:00
2023-04-03 13:00:51,087 - clearml.storage - ERROR - Exception encountered while ...
Hi @<1523701087100473344:profile|SuccessfulKoala55> and @<1523701205467926528:profile|AgitatedDove14> sorry for the delay, I was travelling. Indeed I see that the instance is not booting, it's sort of stuck there so I'm going to first try and find an AMI / instance type that boots via the AWS console and try again via the AutoScaler. Thanks so much for the tips!
Sorry I've been away and just able to come back to this now. I'll run the scikit-learn job sample on my current setup and let you know if it comes back.