Hi @<1523702000586330112:profile|FierceHamster54> ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements
Hey @<1523701087100473344:profile|SuccessfulKoala55> this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs
In the meantime is there some way to set a retention policy for the dataset versions ?
pruning old ancestors sounds like the right move for now.
Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤
Or do I have to add pipeline step to prune ancestors that are too old ?
Hi @<1523702000586330112:profile|FierceHamster54> , how big is the version hierarchy? Can you provide some details on the structure? Also, how many files are in the dataset and what are their sizes?