Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi. I Spent Some Time This Week Trying To Optimise File Transfer Time In And Out Of Processes That Use Google'S Gcs (In Vertex Ai Pipelines). It Seems That In The Case Where I Have A Lot Of Very Small Files, It Made More Sense To Tar.Gz Them And Send A Bi

Hi. I spent some time this week trying to optimise file transfer time in and out of processes that use google's gcs (in vertex ai pipelines).
It seems that in the case where I have a lot of very small files, it made more sense to tar.gz them and send a big blob than to use gsutil (or, presumably, the clearml.StorageManager) to perform parallel (threadpool) transfers.
I wonder what mechanism is used with cleaml pipelines to optimise passing of data from one component to the next and whether tarring / compression was considered.

Posted one year ago
Votes Newest


Generally speaking, for the exact reason if you are passing a list of files, or a folder, it will actually zip them and upload the zip file. Specifically to pipeline it should be similar. BTW I think you can change the number of parallel upload threads in StorageManager, but as you mentioned it is faster to zip into one file. Make sense?

Posted one year ago
1 Answer
one year ago
one year ago