Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Regarding The “Classic” Datasets (Not Hyper Datasets): Is There An Option To Do Something Equivalent To Dvc’S “

Regarding the “classic” datasets (not hyper datasets):
Is there an option to do something equivalent to dvc’s “ https://dvc.org/doc/user-guide/managing-external-data ?
i.e. track urls vs. file bytes, but still get the semantics of knowing when an (external) file changed?

  
  
Posted 2 years ago
Votes Newest

Answers 6


you can run md5 on the file as stored in the remote storage (nfs or s3)

s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?

  
  
Posted 2 years ago

Hi RoughTiger69
I'm actually not sure about DVC support as well, see in these links, syncing and registering is a link, not creating an immutable copy.
And the sync between the local and remote seems like it is downloading the remote and comparing to the local copy.
Basically adding remote source Does not mean DVC will create an immutable copy of the content, it's just a pointer to a bucket (feel free to correct me if I misunderstood their capability)
https://dvc.org/doc/command-reference/remote/modify#available-parameters-for-all-remotes
https://dvc.org/doc/command-reference/status#comparison-against-remote-storage

  
  
Posted 2 years ago

Also, how would one ensure immutability ?
I guess this is the big question, assuming we "know" a file was changed, this will invalidate all versions using it, this is exactly why the current implementation stores an immutable copy. Or are you suggesting a smarter "sync" function ?

  
  
Posted 2 years ago

Hi RoughTiger69

but still get the semantics of knowing when an (external) file changed?

How would you know it changed?
This implies you have a way to verify hash, which means you download the data , no?

  
  
Posted 2 years ago

AgitatedDove14 I haven’t done a full design for this 😉
Just referring to how DVC claims it can detect and invalidate changes in large remote files.
So I take it there is no such feature in http://clear.ml 🙂

  
  
Posted 2 years ago

AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)

  
  
Posted 2 years ago
1K Views
6 Answers
2 years ago
one year ago
Tags