Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There, I Am Intending To Work More Often With The Datasets, But Not Sure If There Is A Way To Retrieve Specific Files From A Uploaded Dataset. I Saw I Can Retrieve Chunks Of Data, But Not Sure How That Would Work With A Dataset Of Parquet Files. If I H

Hi there,
I am intending to work more often with the datasets, but not sure if there is a way to retrieve specific files from a uploaded dataset. I saw I can retrieve chunks of data, but not sure how that would work with a dataset of parquet files. If I have a parquet with my data for each day, how could I store it to make sure I can retrieve a single day only at some point? Is it possible?

  
  
Posted 2 years ago
Votes Newest

Answers 5


Could you supply any reference of this dataset containing other datasets? I might have skipped that when reading the documentation, but I do not recall seeing this functionality.

  
  
Posted 2 years ago

ShallowGoldfish8 , I think the best would be storing them as separate datasets per day and then having a "grand" dataset that includes all days and new days are being added as you go.

What do you think?

  
  
Posted 2 years ago

Apparently found out a solution:
dataset_zip = dataset._task.artifacts['data'].get() will return the path to the zip file containing all the files (that will be downloaded to the local machine)
after that:
import zipfile zip_file = zipfile.ZipFile(d, 'r') files = zip_file.namelist()retrieving the names of the files
unzip using
import os os.system(f'unzip {dataset_zip}') # in this case to your script directoryand using the files list one can them open them selectively

  
  
Posted 2 years ago

Is there a better way?

  
  
Posted 2 years ago

Since the "grand" dataset will inherit from the child versions you wouldn't need to have data duplications

  
  
Posted 2 years ago
1K Views
5 Answers
2 years ago
one year ago
Tags