Dear All, Great To Join Your Community. We Are Working On Plant Growth Stage Models At Basf For Farmers And I Was Wondering If Clearml Can Be Used Also For Data Versioning Of Tabular Data, Structured Data. I Would Like To Track If This And That Row Is Par

Answered

Dear all, great to join your community. We are working on plant growth stage models at BASF for farmers and I was wondering if clearML can be used also for data versioning of tabular data, structured data. I would like to track if this and that row is part of data set xyz. Do you have some practice on this?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SorePelican79
				
					0
					 × 1

Votes Newest

Answers 4

How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

Hi SorePelican79
the experiments themselves will have a link to the Dataset they were using. From a dataset perspective, the idea is not to limit you, so essentially it will package all your files, and retrieve them when you fetch the datset. In terms of specifying a row / sample. My suggestion is to mark those rows when training and while training create a New version with those marked rows (or maybe just of the rows that you used). This new dataset version will also be linked to the creating Task, so you end up with full provenance and lineage of models/datasets , wdyt?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi John, thank you. However, I could not find a hint there how to versionize tablular data. Our data is essentially a huge data frame where each ground truth data point is a row with a unique id. How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SorePelican79
				
					0
					 × 1

Hi SorePelican79 , ClearML can certainly do that. For this you have the Datasets feature.
None
This will allow you to version and track your data super easily 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Hi SorePelican79 , I don't think you can track the data inside the dataset. Maybe SuccessfulKoala55 , might have an idea

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

1K Views

4 Answers

2 years ago