AnxiousSeal95 absolutely agree with you!
When you are put in a situation when a production model has failed, or is not performing how is expected, then if you as a company your deriving revenue off that service, you very quickly have to diagnose what the severity of the problem is, and what is potentially causing it. As you clearly make out, the degrees of freedom which go into why a given model may behave differently include the code itself, the data, the pre-processing steps, the training parameters and the deployment environment itself.
The ability to lock all this down with publishing, as well as being able to difference an entire experiment with another, is very powerful, and immediately helps to close down the potential avenues of investigation for the cause of a model failure.
I have been there before, when a version of numpy was bumped and the setup.py file didn't specify the exact version, and when a deployment environment was recreated, it didn't install the same version of numpy, which led to diverging answers. And you won't be surprised to hear that during a development process of a model, checking package versions is not the first thing you do, you assume it's the code, or the data!