Hi @<1756488209237282816:profile|IdealCamel64> , I think ClearML would be perfect for that. You can also enable users to have their own remote sessions directly to the GPUs (inside a container even). I'd check out ClearML's orchestration layer + remote sessions:
None
None
Regarding what @<1576381444509405184:profile|ManiacalLizard2> said, he's wrong I'm afraid. ClearML can run on top of of K8s if needed and of course the agent supports running inside docker containers as well.
I think that one of ClearML's strengths is allowing you to manage/administer not only a single H100 server but many of those under whatever requirements you might have.
Feels like Docker, Kubernetes is more fit for that purpose ...
@<1756488209237282816:profile|IdealCamel64> , to address your questions:
- Yes
- Yes, but as @<1576381444509405184:profile|ManiacalLizard2> said, let your users try and I'm sure they'll prefer ClearML š
if you want to replace MLflow by ClearML: do it !! It's like "Should I use sandal or running shoes for my next marathon ..."
Let your user try ClearML, and I am pretty sure all of them will want to swap over !!!
@<1523701070390366208:profile|CostlyOstrich36> Thanks for your reply!
I have a couple of follow-up questions:
- Are the features mentioned (orchestration layer, remote sessions, etc.) available for testing in the free version of ClearML?
- Given the following scenario: In our team, some members prefer ClearML for experiment tracking, while others want to use MLflow. Can we use ClearML to handle server monitoring and orchestration, while still allowing flexibility for users to choose their preferred experiment tracking tool.Thanks again for the help!