Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Any Info On The Lifecycle Of Datasets Downloaded To $Home/.Clearml/Cache/Storage_Manager/Datasets Via Get_Local_Copy I Have A Task Running And I Was Watching The Above Path And Datasets Were Being Downloaded And Then They Are All Removed And For A Partic

Any info on the lifecycle of datasets downloaded to $HOME/.clearml/cache/storage_manager/datasets via get_local_copy

I have a task running and I was watching the above path and datasets were being downloaded and then they are all removed and for a particular task it even happens before my task is done and I get file not found error. Are the local copies cleared during a task?

  
  
Posted one year ago
Votes Newest

Answers 12


So was definitely related to the symlinks in some form

could it be it actually deleted the cache? How many agents are running on the same machine ?

  
  
Posted one year ago

Only one. Will replicate it in detail and see what’s actually up

  
  
Posted one year ago

AgitatedDove14 - this was an interesting one. I think I have found the issue, but verifying the fix as of now.

One of the devs was using shutil.copy2 to copy parts of dataset to a temporary directory in a with block - something like:

with TemporaryDirectory(dir=temp_dir) as certificates_directory: for file in test_paths: shutil.copy2(f"{dataset_local}/{file}", f"{certificates_directory}/file")
My suspicion is since copy2 copies with full data and symlinks, it’s messing with the way clearml-data sets up datasets with symlinks to parents etc and when the temp directories are cleaned due to the with blocks the local copies are cleared too

  
  
Posted one year ago

Also it’s not happening when running locally, but only in remote on a agent

  
  
Posted one year ago

Number of entries in the dataset cache can be controlled via cleaml.conf : sdk.storage.cache.default_cache_manager_size

  
  
Posted one year ago

And I think the default is 100 entries, so it should not get cleaned.

and then they are all removed and for a particular task it even happens before my task is done

Is this reproducible ? Who is cleaning it and when?

  
  
Posted one year ago

Hmm, Notice that it does store sym links to parent data versions (to save on multiple copies of the same file). If you call get_mutable_local_copy() you will get a standalone copy

  
  
Posted one year ago

AgitatedDove14 - worked with mutable copy! So was definitely related to the symlinks in some form

  
  
Posted one year ago

Fix - use shutil.copy instead of shutil.copy2 - verifying now.

  
  
Posted one year ago

Ok that wasn’t it 😞

  
  
Posted one year ago

Will try it out. A weird one this.

  
  
Posted one year ago

Could that be it ?

  
  
Posted one year ago