Oh this is thought provoking. Yeah, the idea of using ClearML for R&D is super appealing (to me speaking as an MLOps engineer π ). And having the power of Metaflow's scheduler (on Step Functions with Event Bridge since we'd do the AWS-native deployment) also makes sense to me.
I'll keep asking questions about how we could do event-based jobs with alerting built in on ClearML in a different thread later on.
I pasted your points (anonymously) onto the Metaflow slack to let them speak to any updates that have happened in their product. If you care to read it, this is about as accurate a view as you can get on what Metaflow is today since these were written by a Metaflow founder and core contributor π
Person 1:
Point by point:
-
not true β you can specify the image you want for each step
-
accurate
-
not sure what that means
-
there are cards and UI and integrations with other tools like Comet. So probably more limited than some and less limited than others π
-
Iβll let the OB folks comment on this but yes, I think kube support is probably the most fleshed out (pure AWS is also pretty good since that is where it started π )
-
correct β itβs a feature actually. We did discuss this quite a bit and it is really hard to guarantee side-effect free execution in python
-
Iβll let OB comment on this.
Person 2: -
re: caching -
resume
does what most systems mean by caching but like Romain mentioned, we don't make it overly magical as a feature -
re: kubernetes -
@batch
andstep-functions
are still great options which don't require K8s. I'd agree that the deployment is not trivial in the literal sense of the word π The terraform templates make it quite easy though -
re: role-base access control - see Outerbounds Platform that provides a layer of security and auth features required by enterprises
-
"R&D to production acceleration" is what Metaflow has been about since the very beginning .
It is true though that there are plenty of tools targeting data scientists which provide a nice GUI that make it easier to get started with a few clicks - DataRobot is a great example!
While tools like these seem appealing at the first sight, often they have hard time supporting real-world production use cases with constantly changing data, involved business logic, larger scale, and multiple people working together.
Real-world ML systems shouldn't be islands. They must work well with the surrounding infrastructure and policies. Metaflow is serious about providing a solution that balances requirements both on the engineering as well as on the data science side - so data scientists can develop systems that engineers can happily approve - which might contribute to the impression that "Metaflow is designed with more "devops" in mind".
tl;dr Metaflow is designed with both devops and data scientists in mind!