Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Question On Using Clearml-Data To Manage Contents Of Datasets. I’M Having An Issue Deleting A Directory Within A Dataset Uploaded. Here Are A Few Ways I’Ve Tried, Create New Dataset With Parent, Remove --Files <Path To Folder>. That Doesn’T Work, Only

Question on using clearml-data to manage contents of datasets. I’m having an issue deleting a directory within a dataset uploaded. Here are a few ways I’ve tried, create new dataset with parent, remove --files <path to folder>. That doesn’t work, only works on individual files. Another way is with sync. I create new child dataset, get local copy, then run sync on that folder, make changes, then perform upload. When I download the dataset, the folder still remains. Any ideas?

  
  
Posted one year ago
Votes Newest

Answers 5


For the record, this is a minimal reproducible example:

Local folder structure:
├── remove_folder │ ├── batch_0 │ │ ├── file_0_0.txt │ │ ├── file_0_1.txt │ │ ├── file_0_2.txt │ │ ├── file_0_3.txt │ │ ├── file_0_4.txt │ │ ├── file_0_5.txt │ │ ├── file_0_6.txt │ │ ├── file_0_7.txt │ │ ├── file_0_8.txt │ │ └── file_0_9.txt │ └── batch_1 │ ├── file_1_0.txt │ ├── file_1_1.txt │ ├── file_1_2.txt │ ├── file_1_3.txt │ ├── file_1_4.txt │ ├── file_1_5.txt │ ├── file_1_6.txt │ ├── file_1_7.txt │ ├── file_1_8.txt │ └── file_1_9.txt └── remove_folder.ipynb
` from clearml import Dataset

Create the dataset

ds = Dataset.create(
dataset_project='issues',
dataset_name='remove_folder_test'
)
ds.add_files('remove_folder')
ds.finalize(auto_upload=True)

Create a child dataset (Create new and provide parent, but writable_copy will do this for you!)

ds = Dataset.get(
dataset_project='issues',
dataset_name='remove_folder_test',
writable_copy=True
)
print(ds.list_files())

Will print both batch_0 and batch_1 files

ds.remove_files(dataset_path='batch_0/*')
print(ds.list_files())

Will print only batch_1 files

ds.finalize(auto_upload=True)

Now check for certain with local copy

import os
os.listdir(Dataset.get(dataset_id=ds.id).get_local_copy())

Should return only 'batch_1' `

  
  
Posted one year ago

The above works for me, so if you try and the command line version does not work, there might be a bug. Please post the exact commands you use when you try it 🙂

  
  
Posted one year ago

Hi ExuberantParrot61 ! Can you try using a wildcard? E.g. ds.remove_files(dataset_path='folder_to_delete/*')

  
  
Posted one year ago

Thanks will try. I was using the command line, also with wildcards.

  
  
Posted one year ago

you got it, thanks in advance

  
  
Posted one year ago
741 Views
5 Answers
one year ago
one year ago
Tags