And after your modifications are made you can use . https://clear.ml/docs/latest/docs/references/sdk/dataset/#datasetsquash to squash your modified subset with the main dataset if you want to re-integrate it in your flow. But I don't remember if squash requires the both datasets to be present locally or not...
I doubt there is a direct way to do it since they are stored as archive chunks ๐
I suppose your worker is not persistent, so I might suggest having a very cheap instance as a persistent worker where you have your dataset persistently synced using . https://clear.ml/docs/latest/docs/references/sdk/dataset/#sync_folder and then taking the subset of files that interests you and pushing it as a different dataset, marking it as a subset of your main dataset id using a tag
Ahh. This is a shame. I really want to use ClearML to efficiently compute features but itโs proving a challenge!
Thanks