Hi FierceHamster54 , how big is the version hierarchy? Can you provide some details on the structure? Also, how many files are in the dataset and what are their sizes?
Hey SuccessfulKoala55 this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs
Hi FierceHamster54 ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements
In the meantime is there some way to set a retention policy for the dataset versions ?
Or do I have to add pipeline step to prune ancestors that are too old ?
pruning old ancestors sounds like the right move for now.