That makes sense, yeah it would be nice to have a way to exclude some files when calling sync_folder
clearml-data
also supports glob patterns, so if you have your dataset files in the same directory as the experiment code, you can do something like clearml-data add --files *.csv
and only add the CSV files.
There's no .gitignore-like functionality because clearml-data
is not meant to track everything, and you need to be deliberate in what exactly you're adding. Hope this clarifies things.
One more question has been raised. I have the next situation. I make mutable copy using .get_mutable_local_copy() method and edit/add some files in local folder. Ipynb checkpoints are created after this.
Then I want to synchronise dataset in my storage and call .sync_folder(). The Ipynb checkpoints also will be uploaded because of absence wildcard argument in this method. Could you check this issue?:) I know I can use add_files() method but it seems to me that using of sync_folder more convenient in such scenario. It would be nice if you will add the option for excluding some files in sync_folder method.
@<1537605940121964544:profile|EnthusiasticShrimp49> , @<1523701435869433856:profile|SmugDolphin23> , thank you for the answer!
Hi @<1676038099831885824:profile|BlushingCrocodile88> ! We will soon try to merge a PR submitted via Github that will allow you to specify a list of files to be added to the dataset. So you will then by able to do something like add_files(glob.glob(*) - glob.glob(*.ipynb))