Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Hi JitteryCoyote63 ,
Oh, you have somethings, Nice!
I will look into that document, thanks!
Is there any example of how to use clearml-data
?
(I am not part of the awesome ClearML team, just a happy user 🙂 )
There is an example in the https://github.com/allegroai/clearml/blob/master/docs/datasets.md#workflow section of the linked I shared above
Is it handling data just in a form of regular files?
I will let the team answer you on that one 🙂
This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily
Yeah, as I have known that, now the CLI looks much more familiar to me.
JitteryCoyote63 Is there an example of how the learning pipeline can be triggered (started) by changes in dataset?
BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
I think a dataset monitoring example will be quite cool.