As for experimenting, I'd say (and this community can be my witness 🙂 ) that managing your own experiments isn't a great idea. First, you have to maintain the infra (whatever it is, a tool your wrote yourself, or an excel sheet) which isn't fun and consumes time. From what I've heard, it usually takes at least 50% more time than what you initially think. And since there are so many tools out there that do it for free, then the only reason I can imagine of doing it on your own would be if you have VERY unique needs.
I think ClearML has an advantage over MLFlow and the likes as first, the UI is a bit more up to date and modern with nicer looks and better features, but then when you consider the fact ClearML also has data management, and machine management and these goodies, it becomes and easier decision (In my eyes) not to spread yourself.
As for Kedro, first I'll say it's a great tool! If you use it and love it, keep on keeping on 🙂 I think these guys did a great job! While ClearML doesn't have all the features of Kedro (And obviously Kedro either, they are 2 different tools with 2 different goals), we do have the pipelining feature. The UI is still in the works and Kedro does look better! But if you use CleraML agent, you can probably build better automations. That said, if I had to "advise" I'd say start with ClearML pipelines and if something is missing (Or you want the templating features, which are great but require resources from your side to conform to them), try integrating Kedro. The reason why I'd say start with ClearML and not Kedro is that experiment tracking \ management is a must.
Thanks a lot for your really informative answers. I like ClearML, to be honest, but I like the way Kedro (with its somehow tight and forced project structure) can guide us in the setting of standards. So, do you think that we could benefit from both tools, and use ClearML for experiment tracking while having our code structured within Kedro pipelines?
As for the git, I'm no git expert but having your own git server is doable. I can't tell you what it means in terms of how does it work in your organization though as every one has their own limitations and rules. And as I said, you can use SVN but the connection between it and ClearML won't be as good as with git.
Regarding git/svn, I think the same as you, but I'm afraid it won't be an easy battle... Is it hard to configure a git server?
Hey There Jamie! I'm Erez from the ClearML team and I'd be happy to touch on some points that you mentioned.
First and foremost, I agree with the first answer that was given to you on reddit. There's no "right" tool. most tools are right for the right people and if a tool is too much of a burden, then maybe it isn't right!
Second, I have to say the use of SVN is a "bit" of a hassle. the MLOps space HEAVILY leans towards git. We interface with git and so does every other tool I know of. That said, what we do when you run code integration with our SDK is save the entire script (it's essentially the git diff without a git) so you know what code ran for the model. Is it best practice? Nope it's not, and I think if you can, you should check if some git server is an option, but it should work.
As for features, I think ClearML fits in very nicely. you have a project explorer so you can review yours and others' work. You'll have (in a few weeks) project documentation in markdown so you can write notes on your work. You have a data versioning \ feature store (You can use them interchangeably) to version your data. I think it's quite important if you data changes from time to time (If it stays the same, let's say, for a year+ maybe not...but then again, if you're using ClearML to track experiments then why not spend 30 more minutes to version you data, we'll also cache it for you on the cloudera machines so you don't have to download it manually every time).
Yes definitely. As I said, if you like kedro continue using it. Both tools live happily side by side.
I have replied on the thread too, to haver all the discussion unified.