Hi! Is There Any Way To Add Git-Like Ignore File For Versioning Clearml Data? I Saw In Docs A Wildcard Argument When Files Are Added To A Dataset. How Can I Specify Ignoring Of Some File Types? For Example, I Want To Ignore Ipynb Checkpoints. How Can I Do

Answered

Hi!
Is there any way to add git-like ignore file for versioning clearml data? I saw in docs a wildcard argument when files are added to a dataset. How can i specify ignoring of some file types? For example, i want to ignore ipynb checkpoints. How can i do this?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					BlushingCrocodile88
				
					0
					 × 1

Votes Newest

Answers 5

@<1537605940121964544:profile|EnthusiasticShrimp49> , @<1523701435869433856:profile|SmugDolphin23> , thank you for the answer!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					BlushingCrocodile88
				
					0
					 × 1

One more question has been raised. I have the next situation. I make mutable copy using .get_mutable_local_copy() method and edit/add some files in local folder. Ipynb checkpoints are created after this.
Then I want to synchronise dataset in my storage and call .sync_folder(). The Ipynb checkpoints also will be uploaded because of absence wildcard argument in this method. Could you check this issue?:) I know I can use add_files() method but it seems to me that using of sync_folder more convenient in such scenario. It would be nice if you will add the option for excluding some files in sync_folder method.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					BlushingCrocodile88
				
					0
					 × 1

That makes sense, yeah it would be nice to have a way to exclude some files when calling sync_folder

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

Hi @<1676038099831885824:profile|BlushingCrocodile88> ! We will soon try to merge a PR submitted via Github that will allow you to specify a list of files to be added to the dataset. So you will then by able to do something like add_files(glob.glob(*) - glob.glob(*.ipynb))

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

clearml-data also supports glob patterns, so if you have your dataset files in the same directory as the experiment code, you can do something like clearml-data add --files *.csv and only add the CSV files.

There's no .gitignore-like functionality because clearml-data is not meant to track everything, and you need to be deliberate in what exactly you're adding. Hope this clarifies things.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					EnthusiasticShrimp49
				
					0

Write your answer

2K Views

5 Answers

one year ago