Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, Quick Question About Clearml Datasets. Does Anyone Know If It Is Possible To Access (Could Just Be Paths To The Data In A Bucket) A Dataset Directly From S3, Instead Of Downloading A Local Copy? We Typically Store And Access Large Quantities Of

Hi all, quick question about ClearML Datasets.

Does anyone know if it is possible to access (could just be paths to the data in a bucket) a dataset directly from S3, instead of downloading a local copy? We typically store and access large quantities of data directly from S3…. I think we will run into a lot of problems if we have to cache a copy of the full datasets.

Thanks so much

  
  
Posted 26 days ago
Votes Newest

Answers 3


Hi @<1686909730389233664:profile|AmiableSheep6> , I could suggest using the StorageManager module to pull specific files from S3.

There is no option to download specific files from a dataset. I would suggest breaking it into maybe smaller versions.

You would however need to pull the data locally for training anyways, wouldn't breaking it into smaller versions help this issue?

  
  
Posted 26 days ago

Thank you for the quick response! StorageManager might help… I think what I actually need is a way to get the data in mini batches, instead of caching the entire dataset with get_local_copy(). Bc, yes, I will have to download the data anyway for training

  
  
Posted 26 days ago

Or even better, get a list of the paths to the data locations on S3 (captured within a tracked dataset) which I can pull through boto3 etc

  
  
Posted 26 days ago
80 Views
3 Answers
26 days ago
26 days ago
Tags