Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Dear All, Great To Join Your Community. We Are Working On Plant Growth Stage Models At Basf For Farmers And I Was Wondering If Clearml Can Be Used Also For Data Versioning Of Tabular Data, Structured Data. I Would Like To Track If This And That Row Is Par

Dear all, great to join your community. We are working on plant growth stage models at BASF for farmers and I was wondering if clearML can be used also for data versioning of tabular data, structured data. I would like to track if this and that row is part of data set xyz. Do you have some practice on this?

  
  
Posted 2 years ago
Votes Newest

Answers 4


Hi SorePelican79 , ClearML can certainly do that. For this you have the Datasets feature.
None
This will allow you to version and track your data super easily 🙂

  
  
Posted 2 years ago

How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

Hi SorePelican79
the experiments themselves will have a link to the Dataset they were using. From a dataset perspective, the idea is not to limit you, so essentially it will package all your files, and retrieve them when you fetch the datset. In terms of specifying a row / sample. My suggestion is to mark those rows when training and while training create a New version with those marked rows (or maybe just of the rows that you used). This new dataset version will also be linked to the creating Task, so you end up with full provenance and lineage of models/datasets , wdyt?

  
  
Posted 2 years ago

Hi John, thank you. However, I could not find a hint there how to versionize tablular data. Our data is essentially a huge data frame where each ground truth data point is a row with a unique id. How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

  
  
Posted 2 years ago

Hi SorePelican79 , I don't think you can track the data inside the dataset. Maybe SuccessfulKoala55 , might have an idea

  
  
Posted 2 years ago
1K Views
4 Answers
2 years ago
2 years ago
Tags