Hello Everybody, I Would Like To Start Off By Saying That I Absolutely Love Clearml. I Am Getting Familiar With Clearml Datasets And I Have A Quick Question. Is Is Possible To Download Individual Files From A Dataset Without Downloading The Entire Datase

Answered

Hello everybody,

I would like to start off by saying that I absolutely love clearml.
I am getting familiar with clearml datasets and I have a quick question. Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MassiveGoldfish6
				
					0
					 × 1

Votes Newest

Answers 3

I think that by default the zipped package files are 0.5GB
(you can control it None look for --chunk-size)
I think the missing part of the api is understanding which chunk your specific file stored in.
You can do something like:

ds = Dataset.get(...)
the_artifact_chunk_I_need = ds.file_entries_dict["myt/file/here"].artifact_name

wdyt?
maybe worth to add an interface ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I would like to start off by saying that I absolutely love clearml.

@<1547028031053238272:profile|MassiveGoldfish6> thank you for saying that! 😍

Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?

Well by default files are packaged into multiple zip files, you can control the size of the zip file for finer granularity, but at the end when you download, you are downloading the entire packaged file. Now in case this is a version that contains only a few changes (compared to the parent versions) this zip-file might be relatively small.
Does that help? what is the use case? how large is the dataset ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

That is very useful. Thank you.
A use case would be the following. I have a 200GByte dataset and I want to pull 3 files that are 20MB each

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MassiveGoldfish6
				
					0
					 × 1

Write your answer

3K Views

3 Answers

2 years ago