Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, Im Running Into An Issue When Creating New Clearml Dataset Version. I Want To Create New Version Of Dataset From A Local Folder, But I Don'T Want All Of The Files In The Folder To Be Included, So I Cannot Use

Hi guys, im running into an issue when creating new clearml dataset version. I want to create new version of dataset from a local folder, but I don't want all of the files in the folder to be included, so I cannot use dataset.sync_folder() . Instead I'm removing all the files with dataset.remove_files() and then adding them all with dataset.add_files() . If I then list added modified and removed files with dataset.list_added_files() etc they return correct results (only files that were actually modified etc.)

But when uploading, even files that were not modified or added are being compressed and uploaded. ClearML UI then lists all of the dataset files as modified. I would like to only upload and store the actually modified files.

To reproduce
Setup a folder with atleast 2 files

# create new dataset
dataset = clearml.Dataset.create(dataset_project=project,
                                         dataset_name=self.header['Name'],
                                         dataset_version=new_version,
                                         parent_datasets=parent_ids
                                         )
# remove all folders
dataset.remove_files(folder + "/*")

# add the files we want to include
dataset.add_files(os.path.join(local_path, folder), dataset_path=folder)

# this works as expected
added = dataset.list_added_files()
removed = dataset.list_removed_files()
modified = dataset.list_modified_files()

#upload and finalize
dataset.upload(verbose=True, max_workers=1)
dataset.finalize(verbose=True)

Modify one file and repeat the steps above. added , removed and modified will have correct values but clearml will upload both the files and the ui will report both of the files as modified. Is there another way to sync only part of a folder with a dataset?
image
image

  
  
Posted one month ago
Votes Newest

Answers 2


Hi VividSpider84 ! Thank you for reporting, we were able to reproduce the issue. We will fix it in the next version

  
  
Posted one month ago

I managed to solve it by not removing all the files, but only the ones that are not present locally or are not included. This is what sync_folder is also doing

  
  
Posted one month ago
146 Views
2 Answers
one month ago
one month ago
Tags
Similar posts