Hi there,
I am working on an audio analysis project and I am interested in using CLEARML to manage our data versions and models. However, I have some questions about dataset versioning and reproducibility that I hope you can help me with.
Specifically, I have a large amount of audio data (hundreds of gigabytes) that I extract features from in a pipeline. As the pipeline changes several times, I need to switch between versions of datasets. How can I do this with CLEARML? Does it require a rerun of the pipeline, and is the data saved in the different versions?
Furthermore, if I want to reproduce an experiment that I did in the past with manipulations on the data, is the dataset saved together with the experiment's results? How can I ensure that the experiment is reproducible?
I would greatly appreciate any insights or advice you can provide on these topics.
Thank you in advance for your help.
Best regards