Thanks for replying Martin! (as always)
Do you think ClearML is a strong option for running event-based training and batch inference jobs in production? That’d include monitoring and alerting. I’m afraid that Metaflow will look far more compelling to our teams for that reason.
Since it deploys onto step functions, the scheduling is managed for you and I believe alerts for failing jobs can be set up without adding custom code to every pipeline.
If that’s the case, then we’d probably only use ClearML for the R&D phase and then deploy with Metaflow. But idk if the DS teams would want to use two different tools did somewhat similar tasks, so then they may opt to use Metaflow for everything.
We use MLFlow as our model registry. Maybe we could use ClearML for experiment tracking only since the UI is much better for that. Maybe we could completely switch to ClearML for model registry. Other tools have integrations with MLFlow, though, such as BentoML which draws us not to do that :P