Hi, I Am Trying To Understand Clearml-Data And Only Found This Piece Of Article Explaining It.

Unanswered

Hi SubstantialElk6

but in terms of data provenance, its not clear how i can associate the data versions with the processes that created it.

I think DeliciousBluewhale87 ’s approach is what we are aiming for, but with code.
So using clearml-data from CLI is basically storing/versioning of files (with differentiable based storage etc, but still).
What ou are after (I think) is in your preprocessing code using the programtic Dataset class, to create the Dataset from code, this allows you to both have the storage capabilities and versioning, but also to couple it with the preprocessing code for provenance and automation.
The base assumption is that Dataset is always a Task (with artifacts and fancy interface), but a Task nonetheless, and this gives you all the capabilities of a Task, such as adding metrics/stats on the Data, automation with pipeline etc, but also the ability to later retrieve the data with simple CLI or code.
wdyt?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

343 Views

0 Answers

4 years ago

2 years ago