Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There, I Am Intending To Work More Often With The Datasets, But Not Sure If There Is A Way To Retrieve Specific Files From A Uploaded Dataset. I Saw I Can Retrieve Chunks Of Data, But Not Sure How That Would Work With A Dataset Of Parquet Files. If I H

Hi there,
I am intending to work more often with the datasets, but not sure if there is a way to retrieve specific files from a uploaded dataset. I saw I can retrieve chunks of data, but not sure how that would work with a dataset of parquet files. If I have a parquet with my data for each day, how could I store it to make sure I can retrieve a single day only at some point? Is it possible?

  
  
Posted one year ago
Votes Newest

Answers 5


Since the "grand" dataset will inherit from the child versions you wouldn't need to have data duplications

  
  
Posted one year ago

Could you supply any reference of this dataset containing other datasets? I might have skipped that when reading the documentation, but I do not recall seeing this functionality.

  
  
Posted one year ago

Apparently found out a solution:
dataset_zip = dataset._task.artifacts['data'].get() will return the path to the zip file containing all the files (that will be downloaded to the local machine)
after that:
import zipfile zip_file = zipfile.ZipFile(d, 'r') files = zip_file.namelist()retrieving the names of the files
unzip using
import os os.system(f'unzip {dataset_zip}') # in this case to your script directoryand using the files list one can them open them selectively

  
  
Posted one year ago

Is there a better way?

  
  
Posted one year ago

ShallowGoldfish8 , I think the best would be storing them as separate datasets per day and then having a "grand" dataset that includes all days and new days are being added as you go.

What do you think?

  
  
Posted one year ago
628 Views
5 Answers
one year ago
one year ago
Tags