Hello! I Have Started A Reddit Discussion That Is Gaining Some Momentum:

Answered

Hello!

I have started a reddit discussion that is gaining some momentum:

https://www.reddit.com/r/MachineLearning/comments/mfca0p/d_whats_the_simplest_most_lightweight_but/

In order not to repeat it here, I have posted the URL, so that you can check it out what I said there. Regarding ClearML, it is one of the tools someone recommended me, being completely open source and with no cost (if you choose not to use the "hosted" version). And yet I was considering it, as it seems really interesting and well designed.

So I ask here:
How easy is it to use and configure? Please regard our team size, explained in the previous link. If we adopt ClearML, how many other tools for MLOps will we need? I also should say that we are very keen on Kedro, so we might use Kedro with ClearML. Having both, would you consider we will be pretty much covered?
Thanks a lot in advance!!!

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Votes Newest

Answers 9

Regarding git/svn, I think the same as you, but I'm afraid it won't be an easy battle... Is it hard to configure a git server?

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Thanks a lot for your really informative answers. I like ClearML, to be honest, but I like the way Kedro (with its somehow tight and forced project structure) can guide us in the setting of standards. So, do you think that we could benefit from both tools, and use ClearML for experiment tracking while having our code structured within Kedro pipelines?

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

As for the git, I'm no git expert but having your own git server is doable. I can't tell you what it means in terms of how does it work in your organization though as every one has their own limitations and rules. And as I said, you can use SVN but the connection between it and ClearML won't be as good as with git.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Thank you very much!

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

I have replied on the thread too, to haver all the discussion unified.

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Hey There Jamie! I'm Erez from the ClearML team and I'd be happy to touch on some points that you mentioned.
First and foremost, I agree with the first answer that was given to you on reddit. There's no "right" tool. most tools are right for the right people and if a tool is too much of a burden, then maybe it isn't right!
Second, I have to say the use of SVN is a "bit" of a hassle. the MLOps space HEAVILY leans towards git. We interface with git and so does every other tool I know of. That said, what we do when you run code integration with our SDK is save the entire script (it's essentially the git diff without a git) so you know what code ran for the model. Is it best practice? Nope it's not, and I think if you can, you should check if some git server is an option, but it should work.
As for features, I think ClearML fits in very nicely. you have a project explorer so you can review yours and others' work. You'll have (in a few weeks) project documentation in markdown so you can write notes on your work. You have a data versioning \ feature store (You can use them interchangeably) to version your data. I think it's quite important if you data changes from time to time (If it stays the same, let's say, for a year+ maybe not...but then again, if you're using ClearML to track experiments then why not spend 30 more minutes to version you data, we'll also cache it for you on the cloudera machines so you don't have to download it manually every time).

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

As for experimenting, I'd say (and this community can be my witness 🙂 ) that managing your own experiments isn't a great idea. First, you have to maintain the infra (whatever it is, a tool your wrote yourself, or an excel sheet) which isn't fun and consumes time. From what I've heard, it usually takes at least 50% more time than what you initially think. And since there are so many tools out there that do it for free, then the only reason I can imagine of doing it on your own would be if you have VERY unique needs.
I think ClearML has an advantage over MLFlow and the likes as first, the UI is a bit more up to date and modern with nicer looks and better features, but then when you consider the fact ClearML also has data management, and machine management and these goodies, it becomes and easier decision (In my eyes) not to spread yourself.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Yes definitely. As I said, if you like kedro continue using it. Both tools live happily side by side.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

As for Kedro, first I'll say it's a great tool! If you use it and love it, keep on keeping on 🙂 I think these guys did a great job! While ClearML doesn't have all the features of Kedro (And obviously Kedro either, they are 2 different tools with 2 different goals), we do have the pipelining feature. The UI is still in the works and Kedro does look better! But if you use CleraML agent, you can probably build better automations. That said, if I had to "advise" I'd say start with ClearML pipelines and if something is missing (Or you want the templating features, which are great but require resources from your side to conform to them), try integrating Kedro. The reason why I'd say start with ClearML and not Kedro is that experiment tracking \ management is a must.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Write your answer

1K Views

9 Answers

4 years ago

2 years ago