Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
dataset catalogue as advertised.
Creating the Dataset on ClearML, is the catalog, you can move datasets around, put in sub-folders add tags add meta-data, search etc. I think this qualifies as a dataset catalog , no?
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
I see. Is there a more elaborate codeset that describes the above interactions?
The first is probably done using pipeline controllers, the second using Datasets or HyperDatasets. Its not very clear how the last one is achieved, especially on the searchable data catalogs.
Share data across R&D teams with searchable data catalogs available on any environment.
Create immutable and differentiable versions on-prem or in the cloud with our data agnostic solution.
Hi SubstantialElk6
Generally speaking here, the idea is that actual code creates a Dataset (i.e. Dataset class created from code), plus you can add some metric reporting (like table reporting) to create a preview of the data stored for better visibility, or maybe create some statistics as part of the data ingest script. Then this ingest code can be relaunched / automated. The created Dataset itself can be tagged renamed added key/value for better cataloging. wdyt?