Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Does Dataset.Add_Files Support Uploading From S3 Uri? I Have No Problem Uploading To S3 But Cant Use Data That Is Already In S3? Or Am I Dong Something Wrong? I Read In Documentation That Add_External_Files Supports This Feature, But I Want To Be Able To

Does Dataset.add_files support uploading from S3 uri?
I have no problem uploading to S3 but cant use data that is already in s3?
Or am I dong something wrong? I read in documentation that add_external_files supports this feature, but I want to be able to zip those files that are on S3. If I have millions of images, i prefer zipped, chunked dataset instead of waiting for millions of images to download
image

  
  
Posted 7 months ago
Votes Newest

Answers 6


Yes, but does add_external_files makes chunked zips as add_files do?

  
  
Posted 7 months ago

Hi @<1590514584836378624:profile|AmiableSeaturtle81>
I think you should use add_external_files , instead of add_files (which is for local files)
None

  
  
Posted 7 months ago

Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage

Yes, because somewhere you will have to store your unzipped files.
Or you point to the S3 bucket, and fetch the data when you need to access it (ore prefetch it) with the S3 links the Dataset stores, i.e. only when accessed

  
  
Posted 7 months ago

Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage just to upload the dataset back in zipped format. This is not a good solution.

What we can do I guess is do the downloading locally by some chunks of files?
Download locally 100 files, add_to_clearml dataset, repeat

  
  
Posted 7 months ago

I need the zipping, chunking to manage millions of files

  
  
Posted 7 months ago

Yes, but does add_external_files makes chunked zips as add_files do?

No it references them, (i.e. meta-data not actually doing something with the files themselves)

I need the zipping, chunking to manage millions of files

That makes sens, if that's the case you will have to download those files anyway, and then add them with add_files
you can use the StoargeManager to download them, and then add them from the local copy (this will zip/chunk them)
None

  
  
Posted 7 months ago