[Datasets] Is It Possible To Get An Individual File From A Dataset? Example Would Be Accessing Only A Single Feature From A Feature Store Dataset When It Could Be Costly To Download The Entire Dataset

Answered

[Datasets] is it possible to get an individual file from a dataset?
Example would be accessing only a single feature from a feature store Dataset when it could be costly to download the entire dataset

  				
Posted 
	2 years ago

					More  		
  Report
		
					ReassuredOwl55
				
					0
					 × 1

Votes Newest

Answers 4

Ahh. This is a shame. I really want to use ClearML to efficiently compute features but it’s proving a challenge!
Thanks

  				
Posted 
	2 years ago

					More  		
  Report
		
					ReassuredOwl55
				
					0
					 × 1

I suppose your worker is not persistent, so I might suggest having a very cheap instance as a persistent worker where you have your dataset persistently synced using . https://clear.ml/docs/latest/docs/references/sdk/dataset/#sync_folder and then taking the subset of files that interests you and pushing it as a different dataset, marking it as a subset of your main dataset id using a tag

  				
Posted 
	2 years ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

And after your modifications are made you can use . https://clear.ml/docs/latest/docs/references/sdk/dataset/#datasetsquash to squash your modified subset with the main dataset if you want to re-integrate it in your flow. But I don't remember if squash requires the both datasets to be present locally or not...

  				
Posted 
	2 years ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

I doubt there is a direct way to do it since they are stored as archive chunks 😕

  				
Posted 
	2 years ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Write your answer

2K Views

4 Answers

2 years ago