Hi AmiableSeaturtle81 ! What function are you using to upload the data?
No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing
im also batch uploading, maybe thats the problem?
- The dataset is about 1TB containing 1 million files
- I dont have the SSD space locally to do the upload
- So i download a part of the dataset, use add_files() and then upload() to that batch
- Upload the dataset
I noticed that each batch is slower and slower
AmiableSeaturtle81 note that we zip the files before uploading them as artifacts to the dataset task. Any chance you are specifying the default output uri as being a local path, such as /tmp
?