Hi There, I Am Intending To Work More Often With The Datasets, But Not Sure If There Is A Way To Retrieve Specific Files From A Uploaded Dataset. I Saw I Can Retrieve Chunks Of Data, But Not Sure How That Would Work With A Dataset Of Parquet Files. If I H

Answered

Hi there,
I am intending to work more often with the datasets, but not sure if there is a way to retrieve specific files from a uploaded dataset. I saw I can retrieve chunks of data, but not sure how that would work with a dataset of parquet files. If I have a parquet with my data for each day, how could I store it to make sure I can retrieve a single day only at some point? Is it possible?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShallowGoldfish8
				
					0
					 × 1

Votes Newest

Answers 5

Apparently found out a solution:
dataset_zip = dataset._task.artifacts['data'].get() will return the path to the zip file containing all the files (that will be downloaded to the local machine)
after that:
import zipfile zip_file = zipfile.ZipFile(d, 'r') files = zip_file.namelist()retrieving the names of the files
unzip using
import os os.system(f'unzip {dataset_zip}') # in this case to your script directoryand using the files list one can them open them selectively

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShallowGoldfish8
				
					0
					 × 1

Could you supply any reference of this dataset containing other datasets? I might have skipped that when reading the documentation, but I do not recall seeing this functionality.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShallowGoldfish8
				
					0
					 × 1

ShallowGoldfish8 , I think the best would be storing them as separate datasets per day and then having a "grand" dataset that includes all days and new days are being added as you go.

What do you think?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Since the "grand" dataset will inherit from the child versions you wouldn't need to have data duplications

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Is there a better way?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShallowGoldfish8
				
					0
					 × 1

Write your answer

2K Views

5 Answers

3 years ago

2 years ago