Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Clearml-Data - Incremental Changes And Hashing On Per-File Basis?


Hi EagerOtter28 ,

The integration with cloud backing worked out of the box so that was a smooth experience so far 

Great to read 🙂

When I create a dataset with 10 files and have it uploaded to e.g. S3 and then create a new dataset with the same files in a different folder structure, all files are reuploaded 

 For a few .csv files, it does not matter, but we have datasets in the 100GB-2TB range.

Any specific reason for uploading the same dataset twice? clearml-data will create different task with different zip file for each dataset instance.

If I make a dataset a child of another dataset, will this avoid reuploading?

Yes it should only add the diff files.

Will clearml-data understand that it already holds a local copy of a file if the same file (with the same hash) is part of two datasets?

If its from two different dataset, clearml-data will download each of them

  
  
Posted 2 years ago
96 Views
0 Answers
2 years ago
one year ago
Tags