@<1668427963986612224:profile|GracefulCoral77> You can both create a child or keep the same dataset as long as it is not finalized.
You can skip the finalization using the --skip-close
argument. Anyhow, I can see why the current workflow is confusing. I will discuss it with the team, maybe we should allow syncing unfinalized datasets as well.
Hi @<1668427963986612224:profile|GracefulCoral77> ! The error is a bit misleading. What it actually means is that you shouldn't attempt to modify a finalized clearml dataset (I suppose that is what you are trying to achieve). Instead, you should create a new dataset that inherits from the finalized one and sync that dataset, or leave the dataset in an unfinalized state
Hi @<1523701435869433856:profile|SmugDolphin23> , and thank you for your prompt response.
For my understanding, what is the intended workflow if I intend to keep the same dataset (which should therefore have the same name as it has in the past, and everything should be similar), but generate a new version of that dataset ? Is this what a child dataset is meant to be, or does it mean that I should not have finalised my dataset to begin with ? If the latter, when am I supposed to know when I can finalise a dataset ?
I am particularly puzzled because, according to the documentation of clearml-data sync
, "This option is useful in case a user has a single point of truth (i.e. a folder) which updates from time to time", which to me means that I can use this regularly when I update my "truth folder", but the documentation also states "This command also uploads the data and finalizes the dataset automatically.", which means that then I can no longer use this command. Did I misunderstand something ?
Thank you in advance for your support !