Hi SubstantialElk6 ,
That's an interesting idea. I think if you want to preprocess a lot of data I think the best would be using multiple datasets (each per process) or different versions of datasets. Although I think you can also pull specific chunks of dataset and then you can use just the one - I'm not sure about the last point.
What do you think?
SubstantialElk6 , I think this is what you're looking for:
https://clear.ml/docs/latest/docs/references/sdk/dataset#get_local_copyDataset.get_local_copy(..., part=X)
Although I think you can also pull specific chunks of dataset
How do you do that with clearml-data?
https://clear.ml/docs/latest/docs/references/sdk/dataset/#get_num_chunks
I think this might also be helpful. Gloss over the functions available in the documentation, I think you might find what you're looking for 🙂
Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.