Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi

Hi 👋 I'm trying to remove a few files from a dataset, which has been finalized already with .finalize() . Both with $ clearml-data sync and with $clearml-data remove the dataset gets into this weird state, where the file counts on the dashboard are updated, but when I download the dataset all the files, including the removed ones are still there. I guess this is due to the finalized state. Is the file count change simply a bug or is there some way to properly remove files from a finalized dataset that I just seem to miss?

  
  
Posted one year ago
Votes Newest

Answers 5


Hi UpsetCrow72 ,

Can you please explain which steps you took to make this happen? I'm not sure I understand what exactly happened.

  
  
Posted one year ago

hi CostlyOstrich36 , sorry, let me make it a bit more clear.

I simply upload a bunch of files as a new dataset using the Python API. Then using the CLI I get a local copy where I remove a few of the files. At this step, I tried both simply removing them from the file-system and then using $ clearml-data sync , and also using $ clearml-data remove . I get an error, invalid status, but the result is the same as I described above: file count was updated, but when I get a new local copy again of the dataset, I get all of the files, so it seems like the removing didn't happen.

Is there a step I'm missing? Is it even possible to remove files from a finalized dataset?

  
  
Posted one year ago

How are you getting the data locally? Can you paste the error here?

  
  
Posted one year ago

I'm getting it by $ clearml-data get --id <id> --copy <local_path> and then this is the log output of $ clearml-data remove --id <id> --files <files>

clearml-data - Dataset Management & Versioning CLI Removing files/folder from dataset id 2a0eb9ab619c442abc204775f217d0b9 2022-09-19 13:44:12,964 - clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=2a0eb9ab619c442abc204775f217d0b9, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '***', 'content_size': 30814, 'hash': 'beb96389b3d7a374115a1e340dff51fc6ce65c511a0a93411e0ea85fe8dbfc08', 'timestamp': 1663587852, 'type_data': {'preview': 'Dataset state\nFiles added/modified: 107 - total size 98.51 KB\nCurrent dependency graph: {\n "2a0eb9ab619c442abc204775f217d0b9": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files modified', '0'), ('files added', '108'), ('files removed', '0')]}], force=True) 2022-09-19 13:44:13,656 - clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=2a0eb9ab619c442abc204775f217d0b9, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '***', 'content_size': 30529, 'hash': 'a848da39685d766ed1a5b650510ecc2b5fb1472acef13151572713668ba12161', 'timestamp': 1663587853, 'type_data': {'preview': 'Dataset state\nFiles added/modified: 106 - total size 98.33 KB\nCurrent dependency graph: {\n "2a0eb9ab619c442abc204775f217d0b9": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files modified', '0'), ('files added', '108'), ('files removed', '1')]}], force=True) 2022-09-19 13:44:14,333 - clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=2a0eb9ab619c442abc204775f217d0b9, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '***', 'content_size': 30243, 'hash': 'aead122684666ae652e4bb6621e8bb9cb1a11230b21111cd0ca1e1d1851adc6a', 'timestamp': 1663587854, 'type_data': {'preview': 'Dataset state\nFiles added/modified: 105 - total size 97.94 KB\nCurrent dependency graph: {\n "2a0eb9ab619c442abc204775f217d0b9": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files modified', '0'), ('files added', '108'), ('files removed', '2')]}], force=True) 2022-09-19 13:44:14,935 - clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=2a0eb9ab619c442abc204775f217d0b9, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '***', 'content_size': 29957, 'hash': '1842864e81ff882db7573e95d3c3712cdcad925433bfc30e49b5d3782cbc6d8f', 'timestamp': 1663587854, 'type_data': {'preview': 'Dataset state\nFiles added/modified: 104 - total size 97.43 KB\nCurrent dependency graph: {\n "2a0eb9ab619c442abc204775f217d0b9": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files modified', '0'), ('files added', '108'), ('files removed', '3')]}], force=True) 2022-09-19 13:44:15,556 - clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=2a0eb9ab619c442abc204775f217d0b9, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '***', 'content_size': 29672, 'hash': 'df6fbf7bc6410930e80e1b8b9c4c9d9e3999b2b49fb040d1b0bf38d2c7d77d47', 'timestamp': 1663587855, 'type_data': {'preview': 'Dataset state\nFiles added/modified: 103 - total size 97.05 KB\nCurrent dependency graph: {\n "2a0eb9ab619c442abc204775f217d0b9": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files modified', '0'), ('files added', '108'), ('files removed', '4')]}], force=True) 5 files removed

  
  
Posted one year ago

tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)>

Hi UpsetCrow72
How come you are trying to sync a "completed" (finalized) dataset ?

  
  
Posted one year ago
655 Views
5 Answers
one year ago
one year ago
Tags
Similar posts