Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There, I Am Working On An Audio Analysis Project And I Am Interested In Using Clearml To Manage Our Data Versions And Models. However, I Have Some Questions About Dataset Versioning And Reproducibility That I Hope You Can Help Me With. Specifically, I

Hi there,
I am working on an audio analysis project and I am interested in using CLEARML to manage our data versions and models. However, I have some questions about dataset versioning and reproducibility that I hope you can help me with.
Specifically, I have a large amount of audio data (hundreds of gigabytes) that I extract features from in a pipeline. As the pipeline changes several times, I need to switch between versions of datasets. How can I do this with CLEARML? Does it require a rerun of the pipeline, and is the data saved in the different versions?
Furthermore, if I want to reproduce an experiment that I did in the past with manipulations on the data, is the dataset saved together with the experiment's results? How can I ensure that the experiment is reproducible?
I would greatly appreciate any insights or advice you can provide on these topics.
Thank you in advance for your help.
Best regards

  
  
Posted one year ago
Votes Newest

Answers 2


Hi @<1523713932588486656:profile|PerplexedWalrus3> , I'm not sure about the exact configuration of your setup but I'm quite sure you could do this fairly easily with pipelines and datasets in ClearML. Have you tried playing with Datasets to get the feeling of how it works?

  
  
Posted one year ago

A big part of the way Datasets work is to turn the data into a parameter rather than be part of the code. You will be able to easily reproduce experiments 🙂

  
  
Posted one year ago