Thank you for the quick response! StorageManager might help… I think what I actually need is a way to get the data in mini batches, instead of caching the entire dataset with get_local_copy(). Bc, yes, I will have to download the data anyway for training
Or even better, get a list of the paths to the data locations on S3 (captured within a tracked dataset) which I can pull through boto3 etc
Hi @<1686909730389233664:profile|AmiableSheep6> , I could suggest using the StorageManager module to pull specific files from S3.
There is no option to download specific files from a dataset. I would suggest breaking it into maybe smaller versions.
You would however need to pull the data locally for training anyways, wouldn't breaking it into smaller versions help this issue?