Generally speaking, for the exact reason if you are passing a list of files, or a folder, it will actually zip them and upload the zip file. Specifically to pipeline it should be similar. BTW I think you can change the number of parallel upload threads in StorageManager, but as you mentioned it is faster to zip into one file. Make sense?
Answered
Hi. I Spent Some Time This Week Trying To Optimise File Transfer Time In And Out Of Processes That Use Google'S Gcs (In Vertex Ai Pipelines).
It Seems That In The Case Where I Have A Lot Of Very Small Files, It Made More Sense To Tar.Gz Them And Send A Bi
Hi. I spent some time this week trying to optimise file transfer time in and out of processes that use google's gcs (in vertex ai pipelines).
It seems that in the case where I have a lot of very small files, it made more sense to tar.gz them and send a big blob than to use gsutil (or, presumably, the clearml.StorageManager) to perform parallel (threadpool) transfers.
I wonder what mechanism is used with cleaml pipelines to optimise passing of data from one component to the next and whether tarring / compression was considered.
1K Views
1
Answer
2 years ago
one year ago
Tags