Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
"Clearml.Task - Error - Action Failed <500/0: Tasks.Edit/V1.0 (Update Failed (Bsonobj Size: 18330801 (0X117B4B1) Is Invalid. Size Must Be Between 0 And 16793600(16Mb) F"

"clearml.Task - ERROR - Action failed <500/0: tasks.edit/v1.0 (Update failed (BSONObj size: 18330801 (0x117B4B1) is invalid. Size must be between 0 and 16793600(16MB) F"

  
  
Posted one year ago
Votes Newest

Answers 25


the files are uploaded but metadata is absent 😞

  
  
Posted one year ago

Martin you didn't get me right. We have 1 million small files which we upload in chunks of 512 mb

  
  
Posted one year ago

So you are saying 156 chunks, with each chunk about ~6500 files ?

  
  
Posted one year ago

all metadata that standard for clearml dataset: hashes , tempstamps and names of the 1M uploaded files

  
  
Posted one year ago

500 chunks in total

  
  
Posted one year ago

clearml-data create --name [Dataset Name] --project [Project Name] --output-uri clearml-data add --files [FILE_PATH] --id [Id] clearml-data close

  
  
Posted one year ago

Hi DrabOwl94 , how did you create/save/finalize the dataset?

  
  
Posted one year ago

or we create parent - child 2 datasets splitting the set to two parts

  
  
Posted one year ago

DrabOwl94 how many 1M files did you end up having ?

  
  
Posted one year ago

Hi DrabOwl94
I think that if I understand you correctly you have a Lot of chunks (which translates to a lot of links to small 1MB files, because this is how you setup the chunk size). Now apparently you have reached the maximum number of chunks per specific Dataset version (at the end this meta-data is stored in a document with limited size, specifically 16MB).
How many chunks do you have there?
(In other words what's the size of the entire dataset in MBs)

  
  
Posted one year ago

AgitatedDove14

  
  
Posted one year ago

so 78000 entries ...
wow a lot! would it makes sens to do 1G chunks ? any reason for the initial 1Mb chunk size ?

  
  
Posted one year ago

256GB in total of data

  
  
Posted one year ago

so correct numbers are:

  
  
Posted one year ago

chunksize: 512 Mb

  
  
Posted one year ago

AgitatedDove14
Hello, Martin. Any news about this issue?

We really want to use ClearML for datasets that are hundreds GB worth of data.

Are you saying the ClearML is not able to do that?

  
  
Posted one year ago

~2000 files in each chunk

  
  
Posted one year ago

check the latest RC, it solved an issue with dataset uploading,
Let me check if it also solved this issue

  
  
Posted one year ago

CharmingStarfish14 can you check something from code, just to see if this would solve the issue?

  
  
Posted one year ago

Sure, AgitatedDove14 !

I will get to it next week. Thank you for the answer!

  
  
Posted one year ago

what I meant is that we have 1,000,000 small files in the dataset

  
  
Posted one year ago

78GB

  
  
Posted one year ago

so probably the metadata was too large to fit... Any way to describe the metadata and its scope?

  
  
Posted one year ago

as we see it the only way is to split this dataset to smaller sub-datasets

  
  
Posted one year ago

DrabOwl94 can you attach a code snippet? This error basically means you've hit the maximum size allowed for the task's BSON document, but the dataset itself should be uploaded as an artifact

  
  
Posted one year ago
680 Views
25 Answers
one year ago
one year ago
Tags