Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Have A Newbie Question About Clear-Ml Data. I Have Four Data Sources That Get Combined To Train A Model. I Have Put Each Of These Datasets Into Clear Ml So That I Can Track Their Versions, And Then Create The Fifth 'Combined' Dataset Using The I

Hi all,
I have a newbie question about clear-ml data. I have four data sources that get combined to train a model. I have put each of these datasets into clear ml so that I can track their versions, and then create the fifth 'combined' dataset using the ids from the others. It works great.

My question is the correct method to update that fifth dataset if one of the other datasets changes. Say for example I create a new version of dataset1 , what is the correct method for creating the updated version of combined_dataset ?

  • Use the ids from each of the datasets the same way I did the first time ( Dataset.create(..., parent_datasets = [dataset1.id, dataset2.id, ...] ). In which case, will this actually be a second version of combined_dataset?
  • Do the same as above, but include the previous version of the combined dataset id as a parent as well?
  • Just do Dataset.create(..., parent_datasets = [combined_dataset.id]) and assume that clearml will take care of the rest?
  
  
Posted one year ago
Votes Newest

Answers 2


After playing around in a test project I'm pretty sure option 1 is right. There's no need for the previous combined id to be included because it doesn't inherit anything from that dataset. Happy to be corrected though

  
  
Posted one year ago

Hello @<1604647689662763008:profile|PerfectSwan93> , I tend to agree with you , option one is the best given your use-case. If you keep the same name and project it will result in a version bump on the combined dataset, but it will not point to the previous combined dataset as a parent.

  
  
Posted one year ago
954 Views
2 Answers
one year ago
one year ago
Tags
Similar posts