Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Question About The Storage Manager. Assuming I Have An Object That Updates Frequently And Always Saved At The Same Path (E.G.

Question about the storage manager. Assuming I have an object that updates frequently and always saved at the same path (e.g. my_bucket/my_data.csv ) and I want StorageManaget.get_local_copy to download the updated one, and not fetch the one from the local cache as it is outdated.

How does the StorageManager evaluates if it needs to redownload or not? Only by name?
If so, how should I address this kind of use case (redownload if changed)?

  
  
Posted 4 years ago
Votes Newest

Answers 20


Legit, if you have a cached_file (i.e. exists and accessible), you can return it to the caller

  
  
Posted 4 years ago

I might, I'll look at the internals later cause at a glance I didn't really get the logic inside get_local_copy ... the if there is ending with if ... not cached_file: return cached_file which from reading doesn't make much sense

  
  
Posted 4 years ago

-_- why there isn't a link to source on the docs?

  
  
Posted 4 years ago

In the larger context I'd look on how other object stores treat similar problems, I'm not that advanced in these topics.

But adding a simple force_download flag to the get_local_copy method could solve many cases I can think of, for example I'd set it to true in my case as I don't mind the times it will re-download when not necessary as it is quite small (currently I always delete the local file, but it looks pretty ugly)

  
  
Posted 4 years ago

But adding a simple 

force_download

 flag to the 

get_local_copy

That's sounds like a good idea

  
  
Posted 4 years ago

👍

  
  
Posted 4 years ago

Do you want to PR it? should be a quick fix

  
  
Posted 4 years ago

Well I guess you can say this is definitely not self explanatory line 😉
but, it is actually asking whether we should extract the code, think of it as:
if extract_archive and cached_file: return cls._extract_to_cache(cached_file, name)

  
  
Posted 4 years ago

are you referring to the same line? 47 in cache.py?

  
  
Posted 4 years ago

link to the line please 🙂

  
  
Posted 4 years ago

Legit, if you have a cached_file (i.e. exists and accessible), you can return it to the caller

I agree, so shouldn't it be if cached_file: return cached_file instead of if not cached_file: return cached_file

  
  
Posted 4 years ago

BTW is the if not cached_file: return cached_file is legit or a bug?

  
  
Posted 4 years ago

😄

  
  
Posted 4 years ago

We should probably change it so it is more human readable 🙂

  
  
Posted 4 years ago

WackyRabbit7
Long story short, yes, only by name (hashing might be too slow on large files)
The easiest solution, if the hash is incorrect, delete the local copy it returns, and ask again, it will download it.
I'm not sure if the hashing is exposed, but if it is not, we can add it.
What do you think?

  
  
Posted 4 years ago

I mean usually it would read if cached_file: return cached_file

  
  
Posted 4 years ago
867 Views
20 Answers
4 years ago
one year ago
Tags