ClearML FAQ | Clearml-Data - Incremental Changes And Hashing On Per-File Basis?

Unanswered

Clearml-Data - Incremental Changes And Hashing On Per-File Basis?

Hm OK 🤔
I am not sure whether it's heresy to say that here, but why wouldn't you use a mechanism comparable to what DVC does in the backend?

When you create a dataset, you could hash the individual files and upload them to a cache. Datasets are then groupings of file hashes. When you want to download a dataset, all you have to do is reproduce the folder structure with the files identified by hashes.

This way, it does not matter whether you recreate a dataset with the same files, they would not be reuploaded/downloaded if the hash is the same. And partial/full overlaps would not even have to be defined explicitly.

I know clearml-data has the paradigm "Data is Not Code" and that is fine. You don't need to take the checking in etc. of DVC but the caching architecture of DVC seems pretty cool to me.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					EagerOtter28
				
					0
					 × 1

285 Views

0 Answers

4 years ago

2 years ago