Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤
Or do I have to add pipeline step to prune ancestors that are too old ?
In the meantime is there some way to set a retention policy for the dataset versions ?
Hi @<1523702000586330112:profile|FierceHamster54> , how big is the version hierarchy? Can you provide some details on the structure? Also, how many files are in the dataset and what are their sizes?
pruning old ancestors sounds like the right move for now.
Hi @<1523702000586330112:profile|FierceHamster54> ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements
Hey @<1523701087100473344:profile|SuccessfulKoala55> this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs