Unanswered
Hi.
I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones).
At The End - It Creates A
I ran another version of the above code whereoutput_uri="./random_dataset_local_target"
(i.e. db target on local disk instead of gcp).
I still see large memory usage.
I also find it worrisome that while generating the random dataset and writing it to disk took under 3 minutes, generating the hash took 9 minutes and saving the files to a dataset target in an adjacent folder took 30 minutes (10 times longer than writing the original files)! Simply copying the files to an adjacent folder takes less than 1 minute (so disk io is not the bottleneck).
171 Views
0
Answers
2 years ago
one year ago