One possible solution I could see as well, is putting the data storage to S3 bucket to improve download performance as it is the same cloud provider. No transfer latency.
Just keep in mind my your bottleneck will be the transfer rate. So mounting will not save you anything as you still need to transfer the whole dataset sooner or later to your GPU instance.
One solution is as Jake suggest. The other can be pre-download the data to your instance with a CPU only cheap instance type, then restart the instance with GPU.
Hi @<1556812486840160256:profile|SuccessfulRaven86> , using an S3 bucket in the same region will surely improve performance (it's also without transfer fees, to that's a big plus 🙂 ).
Regarding mounting external storage into a directory, you do not need to actually define any direct storage for that, simply to make sure you direct the ClearML SDK storage.cache.default_base_dir
to that folder - the ClearML caching should take care of the rest.
BTW, a faster cache (faster than mounting an object storage bucket, usually) could be setting up a cloud instance (EC2 instance in AWS, for example) with attached storage (EBS, in this example) and an NFS service, and mounting this storage on each machine you spin up using an NFS mount.