Iām curious what the opinions are on this! I asked myself the same question. In my limited experience, going through a workflow with SageMaker was a painful process, and one that required a ton of AWS-specific code and configuration. Compared to this, ClearML was easy and quick to set up, and provides a dashboard where everything from experiments to models to output is organised, queryable and comparable. Way less hassle for way more benefits.
@<1523701205467926528:profile|AgitatedDove14> you beautiful person, this is terrific! I do believe SageMaker has some nice monitoring/data drift capabilities that seem interesting, but these points you have here will be a fantastic starting point for my team's analysis of the products. I think this will help balance some of the over-enthusiasm towards using the native AWS solution.
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking š
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing š
- SageMaker job is a container, which means for Every job (that in a lot of cases is a one time test) users need to build containers push them into the registry, and then of course forget to remove them. This means it is hard to move from writing code to launching and the management costs are high (tons of containers no one is using and everyone is afraid of deleting)
- As mentioned, SageMaker does not support on-prem/hybrid resources
- SageMaker costs extra on top of the compute
- There is no good dashboard for monitoring jobs and launching them from sagemaker. Basically it was designed for devops for monitoring long lasting servers, not ephemeral jobs constantly changing, and it shows ...
- Multi step pipelines are not supported in sagemaker (I mean you can hack it, but go figure later what really happened)
- Sagemaker does not have caching mechnisms (i.e. rerunning the same job with the same data/args should be reused)
- Sagemaker outputs by default are just more files in S3 bucket, which is a mess to manage
I probably forgot a few, but you get the gist, SageMaker was built to launch containers on EC2, not to manager ML workflows. So other than launching containers (that it does very nicely), everything else is missing.
(just my 2 cents, but I might be a bit biased after having to work with it for a while š )