Do you have any experience and things to watch out for?
Yes, for testing start with cheap node instances 🙂
If I remember correctly everything is preconfigured to support GPU instances (aka nvidia runtime).
You can take one of the templates from here as a starting point:
AgitatedDove14 - i had not used the autoscaler since it asks for access key. Mainly looking for GPU use cases - with sagemaker one can choose any instance they want and use it, autoscaler would need set instance configured right? need to revisit. Also I want to use the k8s glue if not for this. Suggestions?
Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?