Hi All, Is There A Limit To The Maximum Size, Or Number Of Files A Dataset Can Have When Uploading To Clearml Self-Hosted? We Got This Error When

Unanswered

Thanks for your reply 🙂

We worked around the bug by only calling Dataset.add_files once per folder that contains files (~120) using a wildcard, rather than for each individual file (~75,000)

I am unsure what effect this has, but I assume some log or other metadata was being created by the add_files method, and calling it less times made the mongodb document smaller?

Mongo has a way to store documents larger than the 16MB limit using GridFS which may be the solution for large documents, or perhaps an optimisation to reduce the size of this document.

I will create an issue, working on a code snippet that demonstrates the issue in a repeatable way with dummy data.

We are working with a custom dataset made up of numpy files that contain audio features. We have 75,000 files in this particular dataset. Each file is about 500kB max

The bug seems to be related to the number of times add_files is called rather than the size or number of files

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					StaleLeopard22
				
					0
					 × 1

333 Views

0 Answers

2 years ago