What I really like about ClearML is the potential for capturing development at an early stage, as it requires only minimal adjustment of code for it be in the very least captured as an experiment, even if it is run locally on ones machine.
What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models. Removing as many translation layers as you can between the development and the serving process means things are easier to maintain, and when things go wrong, you have less degrees of freedom to consider. Machine learning models are complex beasts, as you have to consider the model parameters, the data and all the pre-processing, and then if you have additional translation layers for deployment, all of those have to be considered before diagnosing any problems.
My view is that if you can make those layers as few and as transparent as possible, as well as allowing very easy comparison between experiments (and that's everything, model, data, code, environment etc.), then hopefully you can very quickly identify things that have changed, and where to investigate if a model is not performing as expected.