Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Clearml-Data - Incremental Changes And Hashing On Per-File Basis?


Hey Alon, thank you for the quick response! 🙂 This clarifies some points, we also experimented a little more now with it.

Our use-cases are unfortunately not completely covered I guess.
Let's say we have a pool of >300k images and growing. With queries in a database, we identify 80k that should form a dataset. We can create a dataset A and have it stored in the cloud, managed by clearml-data . Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: What we would need to do would be to declare the first dataset as parent, remove all images in A that are not in B and add the new B images. Even if we went through this procedure, the complete dataset A would need to be downloaded (since it is a compressed .zip ) to reuse only a fraction of it. This would not scale well I guess.
All files are already hashed, right? I wonder why clearml-data does not keep files in a semi-flat hierarchy and groups them together to datasets? This way, the same file would only be up/downloaded once if the hash checks out, even if the datasets are in no relationship.

  
  
Posted 2 years ago
93 Views
0 Answers
2 years ago
one year ago
Tags