Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
(I am not part of the awesome ClearML team, just a happy user 🙂 )
I will let the team answer you on that one 🙂
JitteryCoyote63 Is there an example of how the learning pipeline can be triggered (started) by changes in dataset?
Yeah, as I have known that, now the CLI looks much more familiar to me.
Hi JitteryCoyote63 ,
Oh, you have somethings, Nice!
I will look into that document, thanks!
Is there any example of how to use clearml-data
?
BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
I think a dataset monitoring example will be quite cool.
There is an example in the https://github.com/allegroai/clearml/blob/master/docs/datasets.md#workflow section of the linked I shared above
Is it handling data just in a form of regular files?
This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily