Hi GreasyPenguin14
Could you tell me what the differences are and why we should use ClearML data?
The first difference is in the approach itself, DVC ties the data with the code (i.e. git repo), where we (ClearML - but not just us) actually think data should be abstracted from the Code-Base and become a standalone argument, allowing users to build/execute against different dataset/versions. ClearML Data becomes part of the workflow as it is visible from the UI including the ability to create structures in projects/sub-projects, naming conversions tags etc. (In the upcoming versions we will be extending the UI visualization capabilities for even better visibility) ClearML data offers full programmatic interface, allowing you to easily build automation processes, directly from code Triggers now support launching Tasks based on new datasets created/tagged in the system (e.g. automation is built in) Users can customize Datasets and add metrics / visualization from code, for increased visibility (e.g. plot the first few lines of a table, upload image samples etc.)
I probably missed a few points, but this is probably a good start 🙂