Hi @<1727497172041076736:profile|TightSheep99>
Yes it can, it will upload the meta-data as well as the files (it will also do de-dup and will not upload files that already exist in the dataset based on the hash of teh file content)
But which function can I use to upload Hyper-datasets? Currently I only saw documentation for Dataset.upload() which uploads the Dataset
I want to store only my raw data in my blob storage, and I want to create a Hyperdataset with all the artificats, metrics, frames, and other bells and whistles which is stored in ClearML but the data is a pointer to my blob storage. Is this possible?
I want to store only my raw data in my blob storage, and I want to create a Hyperdataset with all the artificats, metrics, frames,
Yes that's exactly how it works.
None
This line adds a reference to raw file (local/remote)
https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[…]es/hyperdatasets/data-registration/register_dataset_with_roi.py
This line adds some metadata
https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[…]es/hyperdatasets/data-registration/register_dataset_with_roi.py
This arguments will upload the raw data, if you do not provide it, it will assume the link point to an existing file already (i.e. no need to upload)
https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[…]es/hyperdatasets/data-registration/register_dataset_with_roi.py
ok thank you, but these are custom parser examples, if I am building a tool that allows users to create hyperdatasets and run models with their raw data on cloud storage, and they don't know how to write code, is this still possible? Or would I have to write a custom parser for each user? And it seems that when I use Dataset.create() it creates a Dataset, not a Hyperdataset. How do I create a hyperdataset outside the web interface?
and they don't know how to write code, is this still possible?
well this means there is some standard of the data, right? what is that standard? unfortunately in our space there is no standard fort data, it's just too generic, so everyone always end with custom parsing of a sort.
Does that make sense ?