Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hey, Is There Some Way / Workaround To Speed Up Working With Datasets With Large Number Of Files? Getting A Local Copy Of One Of Our Dataset With 70K Files Already Takes Longer Than Expected, But Working With A Dataset Of Around 100K Files That Has Multip

Hey, is there some way / workaround to speed up working with datasets with large number of files? Getting a local copy of one of our dataset with 70k files already takes longer than expected, but working with a dataset of around 100k files that has multiple parents is just unusable. Should we just avoid merging datasets for this many files? The datasets themselves are small, they're just split into a large number of files.

Posted one year ago
Votes Newest


Hello, I am a data engineer but new to clearml.
If you train in batches then you should only get acces to the batch of document in those 100k. You could use s3 and implement the fetch in the get_item method :)

Posted one year ago