Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi. Looking Into Clearml Support For Datasets, I'D Like To Understand How To Work With Large Datasets And Cases Where Not All The Data Is Downloaded At Once. (E.G. 1. Each Training Epoch Is Performed On A (Preferably Random) Sample Of The Data That Is Dow

Hi.
Looking into clearml support for datasets, I'd like to understand how to work with large datasets and cases where not all the data is downloaded at once. (e.g. 1. Each training epoch is performed on a (preferably random) sample of the data that is downloaded at start of epoch and discarded at end of epoch or 2. Dataset is split such that each worker obtains part of the data)
I see that https://clear.ml/docs/latest/docs/references/sdk/dataset#get_local_copy has a part and num_parts that relates to chunks (which are determined and data upload time?).
Is that the mechanism I'm looking for?

  
  
Posted 2 years ago
Votes Newest

Answers 3


cool. How can I get started with hyper datasets? is it part of the clearml package?
Is it limited to https://clear.ml/pricing/?gclid=Cj0KCQjw5ZSWBhCVARIsALERCvzehkqVOiqJPaum5fsVyyTNMKce91PBHZd1IhQpEFaKvV7toze2A_0aAgXXEALw_wcB accounts?

  
  
Posted 2 years ago

PanickyMoth78

Is it limited to

accounts? (

unfortunately, yes 😊 , but I'm sure sales will be able to hook you up ...

  
  
Posted 2 years ago

Hi PanickyMoth78 , While the ClearML Datasets are meant to handle cases where the entire metadata fit in memory (or disk), the use-case you're describing is exactly where the HyperDatasets come into play, allowing you to use a backend-supported iterator(s) to (optionally randomly) iterate over your metadata (with automatic fetching and caching of raw data as required), which can also be used of course in cases where data split is required.

  
  
Posted 2 years ago
1K Views
3 Answers
2 years ago
one year ago
Tags