Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Clearml-Data - Incremental Changes And Hashing On Per-File Basis?


Hm OK 🤔
I am not sure whether it's heresy to say that here, but why wouldn't you use a mechanism comparable to what DVC does in the backend?

When you create a dataset, you could hash the individual files and upload them to a cache. Datasets are then groupings of file hashes. When you want to download a dataset, all you have to do is reproduce the folder structure with the files identified by hashes.

This way, it does not matter whether you recreate a dataset with the same files, they would not be reuploaded/downloaded if the hash is the same. And partial/full overlaps would not even have to be defined explicitly.

I know clearml-data has the paradigm "Data is Not Code" and that is fine. You don't need to take the checking in etc. of DVC but the caching architecture of DVC seems pretty cool to me.

  
  
Posted 3 years ago
155 Views
0 Answers
3 years ago
one year ago
Tags