Hi community :)
I'm new to ClearML and seeking advice on best practices for managing datasets. I have two types of datasets:
(1) PDFs
(2) Tabular data stored in Excel.
Question 1: Tracking changes in different versions of Excel files
I frequently update my Excel datasets by adding new data and deleting old entries. Can I track these changes across different versions in ClearML?
For instance, if I upload an initial Excel file and later make modifications, is there a way to compare the versions to see what data was added or removed?
Question 2: Handling data stored in S3 without storing it in ClearML
I have a bucket in S3 that stores PDFs, and I prefer not to store these files directly in ClearML. Is there a way to track changes to the files in this S3 bucket, such as monitoring which files have been added or removed?
Thanks!