Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
"Clearml-Data Sync --Folder ." Doesn'T Work

"clearml-data sync --folder ." doesn't work

  
  
Posted one year ago
Votes Newest

Answers 8


I have a dataset of ~24GB and I've tried multiple times uploading it with the sync function.

  • The cache doesn't work, it attempts to download the dataset every time.
  • It "misses" some files somehow. So once the job runs it fails due to missing files.
  • I've ran verify afterwards (from the machine I used to upload the data) and it says it's all good. However, once I inspect the zip files on the server (look for the files in the specific zip the state json says they're in) the files are indeed missing.
  
  
Posted one year ago

Once I used clearml-data add --folder * API everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).

  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> Any ideas on this issue? Thanks!

  
  
Posted one year ago

Hi @<1631102016807768064:profile|ZanySealion18>
sorry missed that one

The cache doesn't work, it attempts to download the dataset every time.

just making sure the dataset itself contains all the files?

Once I used clearml-data add --folder * CLI everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).

Not sure I follow here, is the problem the creation of the dataset of fetching it? is this a single version or multiple versions?

  
  
Posted one year ago

OSX 12.5.1
Python 3.8.1.
Clearml 1.13.1

"clearml-data add --folder ./*" always flattens everything, I have that reproducible 100%.

  
  
Posted one year ago

Single version. The issue seems to be the creation. If I use "clearml-data sync --folder ." it says it uploaded all the files. Running "clearml-data verify --folder ." says it's all good. Metadata on the WebUI reports the expected number of files. However, once I extract the zips (or download the dataset through Python API or CLI) not all the files are there.

"clearml-data add --folder ./*" seems to fix this issue though it doesn't preserve my directory structure so I'd have to write a script to do it manually, but that shouldn't be necessary as clearml-data sync should already be doing that as far as I understand but it seems to have a bug there.

  
  
Posted one year ago

However, once I extract the zips (or download the dataset through Python API or CLI) not all the files are there.

and all the files are registered in the metadata? coulf you add --verbose to the sync command to see what it is doing

"clearml-data add --folder ./*" seems to fix this issue though it doesn't preserve my directory structure

This is also odd, it should Not flatten the folder structure. What is your OS / Python / clearml version?
Is this reproducible ? if so, how could we reproduce and debug?

  
  
Posted one year ago

Clearml 1.13.1

Could you try the latest (1.16.2)? I remember there was a fix specific to Datasets

  
  
Posted one year ago
1K Views
8 Answers
one year ago
one year ago
Tags
Similar posts