Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
"Clearml-Data Sync --Folder ." Doesn'T Work

"clearml-data sync --folder ." doesn't work

  
  
Posted 8 months ago
Votes Newest

Answers 8


OSX 12.5.1
Python 3.8.1.
Clearml 1.13.1

"clearml-data add --folder ./*" always flattens everything, I have that reproducible 100%.

  
  
Posted 8 months ago

Hi ZanySealion18
sorry missed that one

The cache doesn't work, it attempts to download the dataset every time.

just making sure the dataset itself contains all the files?

Once I used clearml-data add --folder * CLI everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).

Not sure I follow here, is the problem the creation of the dataset of fetching it? is this a single version or multiple versions?

  
  
Posted 8 months ago

Single version. The issue seems to be the creation. If I use "clearml-data sync --folder ." it says it uploaded all the files. Running "clearml-data verify --folder ." says it's all good. Metadata on the WebUI reports the expected number of files. However, once I extract the zips (or download the dataset through Python API or CLI) not all the files are there.

"clearml-data add --folder ./*" seems to fix this issue though it doesn't preserve my directory structure so I'd have to write a script to do it manually, but that shouldn't be necessary as clearml-data sync should already be doing that as far as I understand but it seems to have a bug there.

  
  
Posted 8 months ago

Once I used clearml-data add --folder * API everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).

  
  
Posted 8 months ago

Clearml 1.13.1

Could you try the latest (1.16.2)? I remember there was a fix specific to Datasets

  
  
Posted 8 months ago

However, once I extract the zips (or download the dataset through Python API or CLI) not all the files are there.

and all the files are registered in the metadata? coulf you add --verbose to the sync command to see what it is doing

"clearml-data add --folder ./*" seems to fix this issue though it doesn't preserve my directory structure

This is also odd, it should Not flatten the folder structure. What is your OS / Python / clearml version?
Is this reproducible ? if so, how could we reproduce and debug?

  
  
Posted 8 months ago

I have a dataset of ~24GB and I've tried multiple times uploading it with the sync function.

  • The cache doesn't work, it attempts to download the dataset every time.
  • It "misses" some files somehow. So once the job runs it fails due to missing files.
  • I've ran verify afterwards (from the machine I used to upload the data) and it says it's all good. However, once I inspect the zip files on the server (look for the files in the specific zip the state json says they're in) the files are indeed missing.
  
  
Posted 8 months ago

AgitatedDove14 Any ideas on this issue? Thanks!

  
  
Posted 8 months ago
824 Views
8 Answers
8 months ago
8 months ago
Tags
Similar posts