Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Im Having Difficuilty Understanding How To Handle Modified Files On S3

Im having difficuilty understanding how to handle modified files on S3

  • I have a file None , its relative_path is /raw/a.png Stored on clearml Dataset "A" (added with add_external_files)
  • I make a dataset "B" and its parent is now "A"
  • I resize the image and store it in None , its relative path is the same /raw/a.png Stored on clearml Dataset "B" (added with add_external_files)
  • When I look in clearml UI, it shows that 1 new file was added instead of modified
    Have I misunderstand something? I get that there are technically 2 files, but they have the same relative_path (I get it from LinkEntry objects)

This also introduces a bug?
when I do clearml.Dataset.get("id").list_files()
it now shows:
raw/a.png
raw/a.png/a.png

  
  
Posted 4 months ago
Votes Newest

Answers 3


When I look at LinkEntry object, link property is correct, no duplicates. Its relative_path thats duped and also key name in _dataset_link_entries

  
  
Posted 4 months ago

ok, then, I have a solution, but it still makes duplicate names

  • new_dataset._dataset_link_entries = {} # Cleaning all raw/a.png files
  • resize a.png and save it in another location named a_resized.png
  • Add back other files i need (excluding raw/a.png), I add them to new_dataset._ dataset_link_entries
  • Use add_external_files to include it in dataset. Im also using dataset_path=[a list of relative paths]
    What I would expect:
    100 Files removed (all a.png)
    100 Files added (all a_resized.png)

What I get:

when doing new_dataset.list_files() it now returns me these double filenames: raw/a_resized.jpg/a_resized.jpg
Whats up with this?
Already checked all paths, i do not at any time pass double named files

  
  
Posted 4 months ago

Hi @<1590514584836378624:profile|AmiableSeaturtle81> , the reason for this is because each file is hashed and this is how the feature compares between versions. If you're looking to keep track of specific links then the HyperDatasets might be what you're looking for - None

  
  
Posted 4 months ago
349 Views
3 Answers
4 months ago
4 months ago
Tags