Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Cannot Upload A Dataset With A Parent - Seems Very Odd! Clearml Versions I Tried: 1.6.1, 1.6.2 Scenario: * Create Parent Dataset (With Storage On S3) * Upload Data * Close Dataset * Create Child Dataset (Tried With Storage On Both S3 Or On Clearml Serv

Cannot upload a dataset with a parent - seems very odd!

clearml versions I tried: 1.6.1, 1.6.2

scenario:

  • Create parent dataset (with storage on S3)

  • Upload data

  • Close dataset

  • Create child dataset (tried with storage on both S3 or on clearml server)

  • add single file, or folder to child

  • close child

  • Get exception (see below)

clearml-data - Dataset Management & Versioning CLI Finalizing dataset id d80b190d84ca41e1b139c841427dd241 id=d80b190d84ca41e1b139c841427dd241 disable_upload=False chunk_size=512 2022-08-09 07:01:54,819 - clearml.storage - INFO - Downloading: 5.00MB / 5.92MB @ 29.85MBs from 2022-08-09 07:01:54,825 - clearml.storage - INFO - Downloaded 5.92 MB successfully from , saved to /home/ec2-user/.clearml/cache/storage_manager/datasets/2ff81b56341faaaad7796344472ec8d2.state.json Pending uploads, starting dataset upload to Compressing /home/ec2-user/xxx/yyy/zzz.npy Uploading dataset changes (1 files compressed to 1.67 MiB) to `
File compression and upload completed: total size 1.67 MiB, 1 chunked stored (average size 1.67 MiB)

Error: unsupported operand type(s) for +=: 'int' and 'NoneType' `
Any idea? this seems like a really basic scenario, I am sure it worked for me in the past

  
  
Posted 2 years ago
Votes Newest

Answers 8


Hi RoughTiger69 ! Can you try adding the files using a python script such that we could get an exception traceback, something like this:
` from clearml import Dataset

or just use the ID of the dataset you previously created instead of creating a new one

parent_dataset = Dataset.create(dataset_name="xxxx", dataset_project="yyyyy", output_uri=" ")
parent_dataset.add_files("folder1")
parent_dataset.upload()
parent_dataset.finalize()

child_dataset = Dataset.create(dataset_name="xxxx", dataset_project="yyyyy", output_uri=" ", parent_datasets=[parent_dataset.id]) # or just use the ID of the dataset you previously created
child_dataset.add_files("folder2")
child_dataset.upload()
child_dataset.finalize() `Also, how many files are in the parent dataset?
Thanks

  
  
Posted 2 years ago

Can you try it with clearml==1.6.0 please?
Also, can you list the exact commands you ran?

  
  
Posted 2 years ago

It seems to work fine when the parent is on clear.ml storage (tried with toy example of data)

  
  
Posted 2 years ago

Tried with 1.6.0, doesn’t work

#this is the parent clearml-data create --project xxx --name yyy --output-uri `
clearml-data add folder1
clearml-data close

#this is the child, where XYZ is the parent's id
clearml-data create --project xxx --name yyy1 --parents XYZ --output-uri
clearml-data add folder2
clearml-data close
#now I get the error above `

  
  
Posted 2 years ago

no, I tried either with very small files or with 20GB as the parent

  
  
Posted 2 years ago

RoughTiger69 , do you have a rough estimate on the size that breaks it?

  
  
Posted 2 years ago

I tested it again with much smaller data and it seems to work.
I am not sure what is the difference between the use-cases. it seems like something specifically about the particular (big) parent doesn’t agree with clearml…

  
  
Posted 2 years ago

quick update, still trying to reproduce ...

  
  
Posted 2 years ago