Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
"Clearml.Task - Error - Action Failed <500/0: Tasks.Edit/V1.0 (Update Failed (Bsonobj Size: 18330801 (0X117B4B1) Is Invalid. Size Must Be Between 0 And 16793600(16Mb) F"

"clearml.Task - ERROR - Action failed <500/0: tasks.edit/v1.0 (Update failed (BSONObj size: 18330801 (0x117B4B1) is invalid. Size must be between 0 and 16793600(16MB) F"

  
  
Posted 2 years ago
Votes Newest

Answers 25


Hi DrabOwl94
I think that if I understand you correctly you have a Lot of chunks (which translates to a lot of links to small 1MB files, because this is how you setup the chunk size). Now apparently you have reached the maximum number of chunks per specific Dataset version (at the end this meta-data is stored in a document with limited size, specifically 16MB).
How many chunks do you have there?
(In other words what's the size of the entire dataset in MBs)

  
  
Posted 2 years ago

what I meant is that we have 1,000,000 small files in the dataset

  
  
Posted 2 years ago

DrabOwl94 how many 1M files did you end up having ?

  
  
Posted 2 years ago

the files are uploaded but metadata is absent 😞

  
  
Posted 2 years ago

all metadata that standard for clearml dataset: hashes , tempstamps and names of the 1M uploaded files

  
  
Posted 2 years ago

Sure, AgitatedDove14 !

I will get to it next week. Thank you for the answer!

  
  
Posted 2 years ago

CharmingStarfish14 can you check something from code, just to see if this would solve the issue?

  
  
Posted 2 years ago

check the latest RC, it solved an issue with dataset uploading,
Let me check if it also solved this issue

  
  
Posted 2 years ago

Hi DrabOwl94 , how did you create/save/finalize the dataset?

  
  
Posted 2 years ago

clearml-data create --name [Dataset Name] --project [Project Name] --output-uri clearml-data add --files [FILE_PATH] --id [Id] clearml-data close

  
  
Posted 2 years ago

~2000 files in each chunk

  
  
Posted 2 years ago

so correct numbers are:

  
  
Posted 2 years ago

78GB

  
  
Posted 2 years ago

so 78000 entries ...
wow a lot! would it makes sens to do 1G chunks ? any reason for the initial 1Mb chunk size ?

  
  
Posted 2 years ago

AgitatedDove14
Hello, Martin. Any news about this issue?

We really want to use ClearML for datasets that are hundreds GB worth of data.

Are you saying the ClearML is not able to do that?

  
  
Posted 2 years ago

So you are saying 156 chunks, with each chunk about ~6500 files ?

  
  
Posted 2 years ago

Martin you didn't get me right. We have 1 million small files which we upload in chunks of 512 mb

  
  
Posted 2 years ago

as we see it the only way is to split this dataset to smaller sub-datasets

  
  
Posted 2 years ago

DrabOwl94 can you attach a code snippet? This error basically means you've hit the maximum size allowed for the task's BSON document, but the dataset itself should be uploaded as an artifact

  
  
Posted 2 years ago

chunksize: 512 Mb

  
  
Posted 2 years ago

so probably the metadata was too large to fit... Any way to describe the metadata and its scope?

  
  
Posted 2 years ago

or we create parent - child 2 datasets splitting the set to two parts

  
  
Posted 2 years ago

AgitatedDove14

  
  
Posted 2 years ago

500 chunks in total

  
  
Posted 2 years ago

256GB in total of data

  
  
Posted 2 years ago
1K Views
25 Answers
2 years ago
one year ago
Tags