Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Any Info On The Lifecycle Of Datasets Downloaded To $Home/.Clearml/Cache/Storage_Manager/Datasets Via Get_Local_Copy I Have A Task Running And I Was Watching The Above Path And Datasets Were Being Downloaded And Then They Are All Removed And For A Partic

Any info on the lifecycle of datasets downloaded to $HOME/.clearml/cache/storage_manager/datasets via get_local_copy

I have a task running and I was watching the above path and datasets were being downloaded and then they are all removed and for a particular task it even happens before my task is done and I get file not found error. Are the local copies cleared during a task?

  
  
Posted 4 years ago
Votes Newest

Answers 12


So was definitely related to the symlinks in some form

could it be it actually deleted the cache? How many agents are running on the same machine ?

  
  
Posted 4 years ago

Also it’s not happening when running locally, but only in remote on a agent

  
  
Posted 4 years ago

Ok that wasn’t it 😞

  
  
Posted 4 years ago

Number of entries in the dataset cache can be controlled via cleaml.conf : sdk.storage.cache.default_cache_manager_size

  
  
Posted 4 years ago

Fix - use shutil.copy instead of shutil.copy2 - verifying now.

  
  
Posted 4 years ago

And I think the default is 100 entries, so it should not get cleaned.

and then they are all removed and for a particular task it even happens before my task is done

Is this reproducible ? Who is cleaning it and when?

  
  
Posted 4 years ago

Will try it out. A weird one this.

  
  
Posted 4 years ago

AgitatedDove14 - worked with mutable copy! So was definitely related to the symlinks in some form

  
  
Posted 4 years ago

AgitatedDove14 - this was an interesting one. I think I have found the issue, but verifying the fix as of now.

One of the devs was using shutil.copy2 to copy parts of dataset to a temporary directory in a with block - something like:

with TemporaryDirectory(dir=temp_dir) as certificates_directory: for file in test_paths: shutil.copy2(f"{dataset_local}/{file}", f"{certificates_directory}/file")
My suspicion is since copy2 copies with full data and symlinks, it’s messing with the way clearml-data sets up datasets with symlinks to parents etc and when the temp directories are cleaned due to the with blocks the local copies are cleared too

  
  
Posted 4 years ago

Hmm, Notice that it does store sym links to parent data versions (to save on multiple copies of the same file). If you call get_mutable_local_copy() you will get a standalone copy

  
  
Posted 4 years ago

Could that be it ?

  
  
Posted 4 years ago

Only one. Will replicate it in detail and see what’s actually up

  
  
Posted 4 years ago