@<1590514584836378624:profile|AmiableSeaturtle81> note that we zip the files before uploading them as artifacts to the dataset task. Any chance you are specifying the default output uri as being a local path, such as /tmp
?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! What function are you using to upload the data?
im also batch uploading, maybe thats the problem?
- The dataset is about 1TB containing 1 million files
- I dont have the SSD space locally to do the upload
- So i download a part of the dataset, use add_files() and then upload() to that batch
- Upload the dataset
I noticed that each batch is slower and slower
No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing