@<1537605940121964544:profile|EnthusiasticShrimp49> , @<1523701435869433856:profile|SmugDolphin23> , thank you for the answer!
Hi @<1676038099831885824:profile|BlushingCrocodile88> ! We will soon try to merge a PR submitted via Github that will allow you to specify a list of files to be added to the dataset. So you will then by able to do something like add_files(glob.glob(*) - glob.glob(*.ipynb))
clearml-data
also supports glob patterns, so if you have your dataset files in the same directory as the experiment code, you can do something like clearml-data add --files *.csv
and only add the CSV files.
There's no .gitignore-like functionality because clearml-data
is not meant to track everything, and you need to be deliberate in what exactly you're adding. Hope this clarifies things.
That makes sense, yeah it would be nice to have a way to exclude some files when calling sync_folder
One more question has been raised. I have the next situation. I make mutable copy using .get_mutable_local_copy() method and edit/add some files in local folder. Ipynb checkpoints are created after this.
Then I want to synchronise dataset in my storage and call .sync_folder(). The Ipynb checkpoints also will be uploaded because of absence wildcard argument in this method. Could you check this issue?:) I know I can use add_files() method but it seems to me that using of sync_folder more convenient in such scenario. It would be nice if you will add the option for excluding some files in sync_folder method.