Reputation
Badges 1
5 × Eureka!Sorry I've been away and just able to come back to this now. I'll run the scikit-learn job sample on my current setup and let you know if it comes back.
Hi @<1523701087100473344:profile|SuccessfulKoala55> and @<1523701205467926528:profile|AgitatedDove14> sorry for the delay, I was travelling. Indeed I see that the instance is not booting, it's sort of stuck there so I'm going to first try and find an AMI / instance type that boots via the AWS console and try again via the AutoScaler. Thanks so much for the tips!
So I've gotten a little further by tweaking my VPC setup. I see that the autoscaler spins up new instances w/o a public ip so that means specifying a private subnet and having a NAT gateway on the subnet right? I tried running the scikit-learn joblib sample on a CPU instance, and the experiment definitely ran but I see another error. Could there be some permissions missing somewhere?
2023-04-03 15:01:00
2023-04-03 13:00:51,087 - clearml.storage - ERROR - Exception encountered while ...
BTW my first test worked great on a CPU instance. Might have been a temporary issue.
All good! Found the documentation