BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
I think a dataset monitoring example will be quite cool.
This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily
There is an example in the https://github.com/allegroai/clearml/blob/master/docs/datasets.md#workflow section of the linked I shared above
Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Hi JitteryCoyote63 ,
Oh, you have somethings, Nice!
I will look into that document, thanks!
(I am not part of the awesome ClearML team, just a happy user 🙂 )
I will let the team answer you on that one 🙂
Is it handling data just in a form of regular files?
Yeah, as I have known that, now the CLI looks much more familiar to me.
JitteryCoyote63 Is there an example of how the learning pipeline can be triggered (started) by changes in dataset?
Is there any example of how to use clearml-data
?