Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi All, Is There A Limit To The Maximum Size, Or Number Of Files A Dataset Can Have When Uploading To Clearml Self-Hosted? We Got This Error When


Thanks for your reply 🙂

We worked around the bug by only calling Dataset.add_files once per folder that contains files (~120) using a wildcard, rather than for each individual file (~75,000)

I am unsure what effect this has, but I assume some log or other metadata was being created by the add_files method, and calling it less times made the mongodb document smaller?

Mongo has a way to store documents larger than the 16MB limit using GridFS which may be the solution for large documents, or perhaps an optimisation to reduce the size of this document.

I will create an issue, working on a code snippet that demonstrates the issue in a repeatable way with dummy data.

We are working with a custom dataset made up of numpy files that contain audio features. We have 75,000 files in this particular dataset. Each file is about 500kB max

The bug seems to be related to the number of times add_files is called rather than the size or number of files

  
  
Posted one year ago
180 Views
0 Answers
one year ago
one year ago