Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Cannot Upload A Dataset With A Parent - Seems Very Odd! Clearml Versions I Tried: 1.6.1, 1.6.2 Scenario: * Create Parent Dataset (With Storage On S3) * Upload Data * Close Dataset * Create Child Dataset (Tried With Storage On Both S3 Or On Clearml Serv

Cannot upload a dataset with a parent - seems very odd!

clearml versions I tried: 1.6.1, 1.6.2

scenario:

  • Create parent dataset (with storage on S3)

  • Upload data

  • Close dataset

  • Create child dataset (tried with storage on both S3 or on clearml server)

  • add single file, or folder to child

  • close child

  • Get exception (see below)

clearml-data - Dataset Management & Versioning CLI Finalizing dataset id d80b190d84ca41e1b139c841427dd241 id=d80b190d84ca41e1b139c841427dd241 disable_upload=False chunk_size=512 2022-08-09 07:01:54,819 - clearml.storage - INFO - Downloading: 5.00MB / 5.92MB @ 29.85MBs from 2022-08-09 07:01:54,825 - clearml.storage - INFO - Downloaded 5.92 MB successfully from , saved to /home/ec2-user/.clearml/cache/storage_manager/datasets/2ff81b56341faaaad7796344472ec8d2.state.json Pending uploads, starting dataset upload to Compressing /home/ec2-user/xxx/yyy/zzz.npy Uploading dataset changes (1 files compressed to 1.67 MiB) to `
File compression and upload completed: total size 1.67 MiB, 1 chunked stored (average size 1.67 MiB)

Error: unsupported operand type(s) for +=: 'int' and 'NoneType' `
Any idea? this seems like a really basic scenario, I am sure it worked for me in the past

  
  
Posted one year ago
Votes Newest

Answers 8


Tried with 1.6.0, doesn’t work

#this is the parent clearml-data create --project xxx --name yyy --output-uri `
clearml-data add folder1
clearml-data close

#this is the child, where XYZ is the parent's id
clearml-data create --project xxx --name yyy1 --parents XYZ --output-uri
clearml-data add folder2
clearml-data close
#now I get the error above `

  
  
Posted one year ago

It seems to work fine when the parent is on clear.ml storage (tried with toy example of data)

  
  
Posted one year ago

Can you try it with clearml==1.6.0 please?
Also, can you list the exact commands you ran?

  
  
Posted one year ago

I tested it again with much smaller data and it seems to work.
I am not sure what is the difference between the use-cases. it seems like something specifically about the particular (big) parent doesn’t agree with clearml…

  
  
Posted one year ago

RoughTiger69 , do you have a rough estimate on the size that breaks it?

  
  
Posted one year ago

Hi RoughTiger69 ! Can you try adding the files using a python script such that we could get an exception traceback, something like this:
` from clearml import Dataset

or just use the ID of the dataset you previously created instead of creating a new one

parent_dataset = Dataset.create(dataset_name="xxxx", dataset_project="yyyyy", output_uri=" ")
parent_dataset.add_files("folder1")
parent_dataset.upload()
parent_dataset.finalize()

child_dataset = Dataset.create(dataset_name="xxxx", dataset_project="yyyyy", output_uri=" ", parent_datasets=[parent_dataset.id]) # or just use the ID of the dataset you previously created
child_dataset.add_files("folder2")
child_dataset.upload()
child_dataset.finalize() `Also, how many files are in the parent dataset?
Thanks

  
  
Posted one year ago

no, I tried either with very small files or with 20GB as the parent

  
  
Posted one year ago

quick update, still trying to reproduce ...

  
  
Posted one year ago