Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi! Is There Any Way To Add Git-Like Ignore File For Versioning Clearml Data? I Saw In Docs A Wildcard Argument When Files Are Added To A Dataset. How Can I Specify Ignoring Of Some File Types? For Example, I Want To Ignore Ipynb Checkpoints. How Can I Do

Hi!
Is there any way to add git-like ignore file for versioning clearml data? I saw in docs a wildcard argument when files are added to a dataset. How can i specify ignoring of some file types? For example, i want to ignore ipynb checkpoints. How can i do this?

  
  
Posted 9 months ago
Votes Newest

Answers 5


That makes sense, yeah it would be nice to have a way to exclude some files when calling sync_folder

  
  
Posted 9 months ago

clearml-data also supports glob patterns, so if you have your dataset files in the same directory as the experiment code, you can do something like clearml-data add --files *.csv and only add the CSV files.

There's no .gitignore-like functionality because clearml-data is not meant to track everything, and you need to be deliberate in what exactly you're adding. Hope this clarifies things.

  
  
Posted 9 months ago

One more question has been raised. I have the next situation. I make mutable copy using .get_mutable_local_copy() method and edit/add some files in local folder. Ipynb checkpoints are created after this.
Then I want to synchronise dataset in my storage and call .sync_folder(). The Ipynb checkpoints also will be uploaded because of absence wildcard argument in this method. Could you check this issue?:) I know I can use add_files() method but it seems to me that using of sync_folder more convenient in such scenario. It would be nice if you will add the option for excluding some files in sync_folder method.

  
  
Posted 9 months ago

@<1537605940121964544:profile|EnthusiasticShrimp49> , @<1523701435869433856:profile|SmugDolphin23> , thank you for the answer!

  
  
Posted 9 months ago

Hi @<1676038099831885824:profile|BlushingCrocodile88> ! We will soon try to merge a PR submitted via Github that will allow you to specify a list of files to be added to the dataset. So you will then by able to do something like add_files(glob.glob(*) - glob.glob(*.ipynb))

  
  
Posted 9 months ago
707 Views
5 Answers
9 months ago
9 months ago
Tags