Thank you for the quick response! StorageManager might help… I think what I actually need is a way to get the data in mini batches, instead of caching the entire dataset with get_local_copy(). Bc, yes, I will have to download the data anyway for training
Hi @<1686909730389233664:profile|AmiableSheep6> , I could suggest using the StorageManager module to pull specific files from S3.
There is no option to download specific files from a dataset. I would suggest breaking it into maybe smaller versions.
You would however need to pull the data locally for training anyways, wouldn't breaking it into smaller versions help this issue?
Or even better, get a list of the paths to the data locations on S3 (captured within a tracked dataset) which I can pull through boto3 etc