Any reason to not have those as two datasets?
Well, in my particular case the training data's got, like 200 subfolders, each with 2,000 files. I was just curious whether it was possible to pull down one of the subsets
I suppose I could upload 200 different "datasets", rather than one dataset with 200 folders in it, but then clearml-data search would have 200 entries in it? It seemed like a good idea to put them all in one at the time
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
Hi SmallDeer34 👋
The dataset task will download all the dataset when using clearml-data task, you have both in the same one?
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*') ?
It would certainly be nice to have. Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.