Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
"Clearml-Data Sync --Folder ." Doesn'T Work

"clearml-data sync --folder ." doesn't work

  
  
Posted 6 months ago
Votes Newest

Answers 8


However, once I extract the zips (or download the dataset through Python API or CLI) not all the files are there.

and all the files are registered in the metadata? coulf you add --verbose to the sync command to see what it is doing

"clearml-data add --folder ./*" seems to fix this issue though it doesn't preserve my directory structure

This is also odd, it should Not flatten the folder structure. What is your OS / Python / clearml version?
Is this reproducible ? if so, how could we reproduce and debug?

  
  
Posted 6 months ago

I have a dataset of ~24GB and I've tried multiple times uploading it with the sync function.

  • The cache doesn't work, it attempts to download the dataset every time.
  • It "misses" some files somehow. So once the job runs it fails due to missing files.
  • I've ran verify afterwards (from the machine I used to upload the data) and it says it's all good. However, once I inspect the zip files on the server (look for the files in the specific zip the state json says they're in) the files are indeed missing.
  
  
Posted 6 months ago

@<1523701205467926528:profile|AgitatedDove14> Any ideas on this issue? Thanks!

  
  
Posted 6 months ago

Single version. The issue seems to be the creation. If I use "clearml-data sync --folder ." it says it uploaded all the files. Running "clearml-data verify --folder ." says it's all good. Metadata on the WebUI reports the expected number of files. However, once I extract the zips (or download the dataset through Python API or CLI) not all the files are there.

"clearml-data add --folder ./*" seems to fix this issue though it doesn't preserve my directory structure so I'd have to write a script to do it manually, but that shouldn't be necessary as clearml-data sync should already be doing that as far as I understand but it seems to have a bug there.

  
  
Posted 6 months ago

Hi @<1631102016807768064:profile|ZanySealion18>
sorry missed that one

The cache doesn't work, it attempts to download the dataset every time.

just making sure the dataset itself contains all the files?

Once I used clearml-data add --folder * CLI everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).

Not sure I follow here, is the problem the creation of the dataset of fetching it? is this a single version or multiple versions?

  
  
Posted 6 months ago

Once I used clearml-data add --folder * API everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).

  
  
Posted 6 months ago

OSX 12.5.1
Python 3.8.1.
Clearml 1.13.1

"clearml-data add --folder ./*" always flattens everything, I have that reproducible 100%.

  
  
Posted 6 months ago

Clearml 1.13.1

Could you try the latest (1.16.2)? I remember there was a fix specific to Datasets

  
  
Posted 6 months ago
611 Views
8 Answers
6 months ago
6 months ago
Tags
Similar posts