Hi RoughTiger69
but still get the semantics of knowing when an (external) file changed?
How would you know it changed?
This implies you have a way to verify hash, which means you download the data , no?
AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)
you can run md5 on the file as stored in the remote storage (nfs or s3)
s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?
Also, how would one ensure immutability ?
I guess this is the big question, assuming we "know" a file was changed, this will invalidate all versions using it, this is exactly why the current implementation stores an immutable copy. Or are you suggesting a smarter "sync" function ?
AgitatedDove14 I haven’t done a full design for this 😉
Just referring to how DVC claims it can detect and invalidate changes in large remote files.
So I take it there is no such feature in http://clear.ml 🙂
Hi RoughTiger69
I'm actually not sure about DVC support as well, see in these links, syncing and registering is a link, not creating an immutable copy.
And the sync between the local and remote seems like it is downloading the remote and comparing to the local copy.
Basically adding remote source Does not mean DVC will create an immutable copy of the content, it's just a pointer to a bucket (feel free to correct me if I misunderstood their capability)
https://dvc.org/doc/command-reference/remote/modify#available-parameters-for-all-remotes
https://dvc.org/doc/command-reference/status#comparison-against-remote-storage