Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everybody, I Would Like To Start Off By Saying That I Absolutely Love Clearml. I Am Getting Familiar With Clearml Datasets And I Have A Quick Question. Is Is Possible To Download Individual Files From A Dataset Without Downloading The Entire Datase

Hello everybody,

I would like to start off by saying that I absolutely love clearml.
I am getting familiar with clearml datasets and I have a quick question. Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?

  
  
Posted one year ago
Votes Newest

Answers 3


I would like to start off by saying that I absolutely love clearml.

@<1547028031053238272:profile|MassiveGoldfish6> thank you for saying that! 😍

Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?

Well by default files are packaged into multiple zip files, you can control the size of the zip file for finer granularity, but at the end when you download, you are downloading the entire packaged file. Now in case this is a version that contains only a few changes (compared to the parent versions) this zip-file might be relatively small.
Does that help? what is the use case? how large is the dataset ?

  
  
Posted one year ago

That is very useful. Thank you.
A use case would be the following. I have a 200GByte dataset and I want to pull 3 files that are 20MB each

  
  
Posted one year ago

I think that by default the zipped package files are 0.5GB
(you can control it None look for --chunk-size)
I think the missing part of the api is understanding which chunk your specific file stored in.
You can do something like:

ds = Dataset.get(...)
the_artifact_chunk_I_need = ds.file_entries_dict["myt/file/here"].artifact_name

wdyt?
maybe worth to add an interface ?

  
  
Posted one year ago
1K Views
3 Answers
one year ago
one year ago
Tags