Answered

Question On Using Clearml-Data To Manage Contents Of Datasets. I’M Having An Issue Deleting A Directory Within A Dataset Uploaded. Here Are A Few Ways I’Ve Tried, Create New Dataset With Parent, Remove --Files <Path To Folder>. That Doesn’T Work, Only

Question on using clearml-data to manage contents of datasets. I’m having an issue deleting a directory within a dataset uploaded. Here are a few ways I’ve tried, create new dataset with parent, remove --files <path to folder>. That doesn’t work, only works on individual files. Another way is with sync. I create new child dataset, get local copy, then run sync on that folder, make changes, then perform upload. When I download the dataset, the folder still remains. Any ideas?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExuberantParrot61
				
					0
					 × 1

Votes Newest

Answers 5

Hi ExuberantParrot61 ! Can you try using a wildcard? E.g. ds.remove_files(dataset_path='folder_to_delete/*')

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

For the record, this is a minimal reproducible example:

Local folder structure:
├── remove_folder │ ├── batch_0 │ │ ├── file_0_0.txt │ │ ├── file_0_1.txt │ │ ├── file_0_2.txt │ │ ├── file_0_3.txt │ │ ├── file_0_4.txt │ │ ├── file_0_5.txt │ │ ├── file_0_6.txt │ │ ├── file_0_7.txt │ │ ├── file_0_8.txt │ │ └── file_0_9.txt │ └── batch_1 │ ├── file_1_0.txt │ ├── file_1_1.txt │ ├── file_1_2.txt │ ├── file_1_3.txt │ ├── file_1_4.txt │ ├── file_1_5.txt │ ├── file_1_6.txt │ ├── file_1_7.txt │ ├── file_1_8.txt │ └── file_1_9.txt └── remove_folder.ipynb
` from clearml import Dataset

Create the dataset

ds = Dataset.create(
dataset_project='issues',
dataset_name='remove_folder_test'
)
ds.add_files('remove_folder')
ds.finalize(auto_upload=True)

Create a child dataset (Create new and provide parent, but writable_copy will do this for you!)

ds = Dataset.get(
dataset_project='issues',
dataset_name='remove_folder_test',
writable_copy=True
)
print(ds.list_files())

Will print both batch_0 and batch_1 files

ds.remove_files(dataset_path='batch_0/*')
print(ds.list_files())

Will print only batch_1 files

ds.finalize(auto_upload=True)

Now check for certain with local copy

import os
os.listdir(Dataset.get(dataset_id=ds.id).get_local_copy())

Should return only 'batch_1' `

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Thanks will try. I was using the command line, also with wildcards.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExuberantParrot61
				
					0
					 × 1

The above works for me, so if you try and the command line version does not work, there might be a bug. Please post the exact commands you use when you try it 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

you got it, thanks in advance

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExuberantParrot61
				
					0
					 × 1

Write your answer

794 Views

5 Answers

one year ago