Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Clearml-Data - Incremental Changes And Hashing On Per-File Basis?


Hi EagerOtter28

Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: ...

Use Dataset.sync (or clearml-data sync) to check which files where changed/added.

All files are already hashed, right? I wonder why 

clearml-data

 does not keep files in a semi-flat hierarchy and groups them together to datasets?

It kind of does, it has a full listing of all the files with their hash (SHA2) values, for all the files in a version (including reference to the owner version, so it can immediately know which dataset versions it needs to download, and how to link to them.
I think we are missing some interface for you to fully implement you use case, check here:
https://github.com/allegroai/clearml/blob/6a91374c2dd177b7bdf4c43efca8e6fb0d432648/clearml/datasets/dataset.py#L47
and let me know what do you think is missing

  
  
Posted 3 years ago
162 Views
0 Answers
3 years ago
one year ago
Tags