It would certainly be nice to have. Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hi SmallDeer34 👋
The dataset task will download all the dataset when using clearml-data
task, you have both in the same one?
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
Any reason to not have those as two datasets?
Well, in my particular case the training data's got, like 200 subfolders, each with 2,000 files. I was just curious whether it was possible to pull down one of the subsets
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
I suppose I could upload 200 different "datasets", rather than one dataset with 200 folders in it, but then clearml-data search
would have 200 entries in it? It seemed like a good idea to put them all in one at the time