Hi EagerOtter28 ,
The integration with cloud backing worked out of the box so that was a smooth experience so farÂ
Great to read 🙂
When I create a dataset with 10 files and have it uploaded to e.g. S3 and then create a new dataset with the same files in a different folder structure, all files are reuploadedÂ
 For a few .csv files, it does not matter, but we have datasets in the 100GB-2TB range.
Any specific reason for uploading the same dataset twice? clearml-data
will create different task with different zip file for each dataset instance.
If I make a dataset a child of another dataset, will this avoid reuploading?
Yes it should only add the diff files.
Will clearml-data understand that it already holds a local copy of a file if the same file (with the same hash) is part of two datasets?
If its from two different dataset, clearml-data
will download each of them