Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Question On Using Clearml-Data To Manage Contents Of Datasets. I’M Having An Issue Deleting A Directory Within A Dataset Uploaded. Here Are A Few Ways I’Ve Tried, Create New Dataset With Parent, Remove --Files <Path To Folder>. That Doesn’T Work, Only

Question on using clearml-data to manage contents of datasets. I’m having an issue deleting a directory within a dataset uploaded. Here are a few ways I’ve tried, create new dataset with parent, remove --files <path to folder>. That doesn’t work, only works on individual files. Another way is with sync. I create new child dataset, get local copy, then run sync on that folder, make changes, then perform upload. When I download the dataset, the folder still remains. Any ideas?

  
  
Posted one year ago
Votes Newest

Answers 5


Hi ExuberantParrot61 ! Can you try using a wildcard? E.g. ds.remove_files(dataset_path='folder_to_delete/*')

  
  
Posted one year ago

For the record, this is a minimal reproducible example:

Local folder structure:
├── remove_folder │ ├── batch_0 │ │ ├── file_0_0.txt │ │ ├── file_0_1.txt │ │ ├── file_0_2.txt │ │ ├── file_0_3.txt │ │ ├── file_0_4.txt │ │ ├── file_0_5.txt │ │ ├── file_0_6.txt │ │ ├── file_0_7.txt │ │ ├── file_0_8.txt │ │ └── file_0_9.txt │ └── batch_1 │ ├── file_1_0.txt │ ├── file_1_1.txt │ ├── file_1_2.txt │ ├── file_1_3.txt │ ├── file_1_4.txt │ ├── file_1_5.txt │ ├── file_1_6.txt │ ├── file_1_7.txt │ ├── file_1_8.txt │ └── file_1_9.txt └── remove_folder.ipynb
` from clearml import Dataset

Create the dataset

ds = Dataset.create(
dataset_project='issues',
dataset_name='remove_folder_test'
)
ds.add_files('remove_folder')
ds.finalize(auto_upload=True)

Create a child dataset (Create new and provide parent, but writable_copy will do this for you!)

ds = Dataset.get(
dataset_project='issues',
dataset_name='remove_folder_test',
writable_copy=True
)
print(ds.list_files())

Will print both batch_0 and batch_1 files

ds.remove_files(dataset_path='batch_0/*')
print(ds.list_files())

Will print only batch_1 files

ds.finalize(auto_upload=True)

Now check for certain with local copy

import os
os.listdir(Dataset.get(dataset_id=ds.id).get_local_copy())

Should return only 'batch_1' `

  
  
Posted one year ago

Thanks will try. I was using the command line, also with wildcards.

  
  
Posted one year ago

The above works for me, so if you try and the command line version does not work, there might be a bug. Please post the exact commands you use when you try it 🙂

  
  
Posted one year ago

you got it, thanks in advance

  
  
Posted one year ago
794 Views
5 Answers
one year ago
one year ago
Tags