All datasets contain sensitive data, and I wish there was some way to use a remote network drive as a cache (sounds weird, but is there a better way?)
FreshParrot56 we could add this capability, but the main caveat is that f your version depends on multiple parent versions you still need to download and extract all the parent versions, which means that when you clear them you might hurt later performance. Does that make sense? What is the use-case / scenario for you?
I want to get a copy of the dataset with sensitive data on a remote network drive, and I don't want this data or any part of it to remain on the computer from which I am executing the request (even sacrificing performance)
Hi FreshParrot56 ! This is currently not supported 🙁
FreshParrot56 You could modify this entry in your clearml.conf
to point to your drive: sdk.storage.cache.default_base_dir
.
Or, if you don't want to touch your conf file, you could set the env var CLEARML_CACHE_DIR
to your remote drive before you call get_local_copy. See this example:dataset = Dataset.get(DATASET_ID) os.environ["CLEARML_CACHE_DIR"] = "/mnt/remote/drive" # change the clearml cache, make it point to your remote drive copy_path = dataset.get_local_copy() print(copy_path) # the path will point to your remote drive del os.environ["CLEARML_CACHE_DIR"] # delete the env var, now clearml will once again cache data to your local machine copy_path = dataset.get_local_copy() print(copy_path)