When I look at LinkEntry object, link property is correct, no duplicates. Its relative_path thats duped and also key name in _dataset_link_entries
ok, then, I have a solution, but it still makes duplicate names
- new_dataset._dataset_link_entries = {} # Cleaning all raw/a.png files
- resize a.png and save it in another location named a_resized.png
- Add back other files i need (excluding raw/a.png), I add them to new_dataset._ dataset_link_entries
- Use add_external_files to include it in dataset. Im also using dataset_path=[a list of relative paths]
What I would expect:
100 Files removed (all a.png)
100 Files added (all a_resized.png)
What I get:
when doing new_dataset.list_files() it now returns me these double filenames: raw/a_resized.jpg/a_resized.jpg
Whats up with this?
Already checked all paths, i do not at any time pass double named files
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , the reason for this is because each file is hashed and this is how the feature compares between versions. If you're looking to keep track of specific links then the HyperDatasets might be what you're looking for - None