Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Dear All, Great To Join Your Community. We Are Working On Plant Growth Stage Models At Basf For Farmers And I Was Wondering If Clearml Can Be Used Also For Data Versioning Of Tabular Data, Structured Data. I Would Like To Track If This And That Row Is Par

Dear all, great to join your community. We are working on plant growth stage models at BASF for farmers and I was wondering if clearML can be used also for data versioning of tabular data, structured data. I would like to track if this and that row is part of data set xyz. Do you have some practice on this?

  
  
Posted one year ago
Votes Newest

Answers 4


Hi @<1543766544847212544:profile|SorePelican79> , ClearML can certainly do that. For this you have the Datasets feature.
None
This will allow you to version and track your data super easily 🙂

  
  
Posted one year ago

Hi John, thank you. However, I could not find a hint there how to versionize tablular data. Our data is essentially a huge data frame where each ground truth data point is a row with a unique id. How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

  
  
Posted one year ago

Hi @<1543766544847212544:profile|SorePelican79> , I don't think you can track the data inside the dataset. Maybe @<1523701087100473344:profile|SuccessfulKoala55> , might have an idea

  
  
Posted one year ago

How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

Hi @<1543766544847212544:profile|SorePelican79>
the experiments themselves will have a link to the Dataset they were using. From a dataset perspective, the idea is not to limit you, so essentially it will package all your files, and retrieve them when you fetch the datset. In terms of specifying a row / sample. My suggestion is to mark those rows when training and while training create a New version with those marked rows (or maybe just of the rows that you used). This new dataset version will also be linked to the creating Task, so you end up with full provenance and lineage of models/datasets , wdyt?

  
  
Posted one year ago
1K Views
4 Answers
one year ago
one year ago
Tags