Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'M Having A Hard Time Trying To Understand The Dataset Class. What I Need Is To Be Able To Get The Dataset, Delete A File, And Upload It Again. But The Problem Is When I Call The

Hi, I'm having a hard time trying to understand the Dataset class.
What I need is to be able to get the dataset, delete a file, and upload it again. But the problem is when I call the remove_files I get that no file was removed

  
  
Posted 3 years ago
Votes Newest

Answers 30


NICE! MagnificentSeaurchin79 could you PR this fix?

  
  
Posted 3 years ago

I don't understand what I'm doing wrong..how are you supposed to call the remove_files function?

  
  
Posted 3 years ago

if I change line 217 in dataset.py :
fnmatch(k, '*/'+ wildcard)for
fnmatch(k, wildcard)then it works :-)

  
  
Posted 3 years ago

sure, but I don't know if this doesn't break something else

  
  
Posted 3 years ago

it's my first PR to an opensource project 😁

  
  
Posted 3 years ago

awesome! will you do the PR or should I?

  
  
Posted 3 years ago

and in the plots section:

  
  
Posted 3 years ago

Oh, fork the repository (this will create a copy on your GitHub account), this is done from GitHub's web page
Then commit to your repository (on the master branch)
Then in the GitHub page of the repository on your account, you will have a green button suggesting you to PR it πŸ™‚

  
  
Posted 3 years ago

but I don't see any change...where is the link to the file removed from

In the meta data section, check the artifacts "state" object

How are these two datasets different?

Like comparing two experiments :)

  
  
Posted 3 years ago

is this ok?

  
  
Posted 3 years ago

example

  
  
Posted 3 years ago

MagnificentSeaurchin79 are you using the latest RC ?
(I think this was exactly the issue)
EDIT:
try to create the version withe the file removed after you upgrade to the latest RC (0.17.5rc3) in the summary you should see 1 file removed.

  
  
Posted 3 years ago

Thanks!
I think this one will cover both case (the issue is with files on the root of the dataset)
if not (fnmatch(k, path) and fnmatch(k if '/' in k else '/{}'.format(k), '*/' + wildcard))}

  
  
Posted 3 years ago

if I list_files the new dataset, I see the same files πŸ˜•

  
  
Posted 3 years ago

ah, I see..so I do it in master or in 0.17.5rc3?

  
  
Posted 3 years ago

but I like f-strings better

  
  
Posted 3 years ago

I left it as you wrote it

  
  
Posted 3 years ago

Thanks!

  
  
Posted 3 years ago

are you planning on changing to f-strings incrementally?

There is still py 2.7 & 3.5 support...
Hopefully we will be able to drop both (apparently enough users have legacy code), then we will probably switch to the nicer f' strings πŸ™‚

  
  
Posted 3 years ago

INHO, the remove_files('logo.png') shouldn't return 0..and I think the problem is that the file passed as argument is not correctly matched with the files stored in the dataset.

  
  
Posted 3 years ago

but I don't see any change...where is the link to the file removed from?
How are these two datasets different?
Thanks πŸ™‚

  
  
Posted 3 years ago

Thanks MagnificentSeaurchin79 ! This code snippet is exactly what I needed, let me check if I can reproduce it.

  
  
Posted 3 years ago

are you planning on changing to f-strings incrementally?

  
  
Posted 3 years ago

You are doing great πŸ™‚ don't worry about it

  
  
Posted 3 years ago

ah, I see master is the same as 0.17.5rc3

  
  
Posted 3 years ago

I did, but I still have the same issue..

tglema@mvd0000xlrndtl2Β clearml-src

git:(28b8502) βœ—

git status
HEAD detached at 0.17.5rc3

I did a python setup.py develop, and ran the script:
` from clearml import Dataset

dataset = Dataset.create(dataset_project='test', dataset_name='example')
dataset.add_files('/home/tglema/example.jpeg')
dataset.add_files('/home/tglema/logo.png')
print(dataset.list_files())
dataset.upload()
dataset.finalize()

dataset_new = Dataset.create(dataset_project='test', dataset_name='example_without_logo', parent_datasets=[dataset.id] )
print(dataset_new.list_files())
print(dataset_new.remove_files('logo.png'))
print(dataset_new.list_removed_files())
dataset_new.upload()
dataset_new.finalize() `and this is the output

['example.jpeg', 'logo.png']
Uploading compressed dataset changes (2 files, total 49.07 KB) to

Upload completed (49.07 KB)
2021-02-05 17:04:38,333 - clearml.Task - INFO - Waiting to finish uploads
2021-02-05 17:04:38,334 - clearml.Task - INFO - Finished uploading
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!
['example.jpeg', 'logo.png']
0
[]
2021-02-05 17:04:41,339 - clearml - INFO - No pending files, skipping upload.
2021-02-05 17:04:41,935 - clearml.Task - INFO - Waiting to finish uploads
2021-02-05 17:04:41,935 - clearml.Task - INFO - Finished uploading
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!

(I added the print('!'*15) to check that it was using that module)

  
  
Posted 3 years ago

Please go ahead with the PR πŸ™‚

  
  
Posted 3 years ago

so how do I make a PR? πŸ˜…
I don't have write access..

  
  
Posted 3 years ago

Hi MagnificentSeaurchin79
Yes this is a bit confusing πŸ™‚
Datasets are stored as delta changes from parent versions.

A dataset contains a list of files and list of artifacts where these files exist. This means that if we add a new file to a dataset we create a new dataset from a parent dataset and want to add a file, we have to add a link to the file, and have a new artifact containing just the delta (i.e. the new file) from the parent version When you delete a file you just remove the link to the file (no need to change the parent zip)
Make sense ?

btw: Make sure you are using the latest RC (I think we fixed a bug in some edge case I can't remeber regrading Datasets)

  
  
Posted 3 years ago
1K Views
30 Answers
3 years ago
one year ago
Tags
Similar posts