sure, but I don't know if this doesn't break something else
I did, but I still have the same issue..
tglema@mvd0000xlrndtl2Β clearml-src
git:(28b8502) β
git status
HEAD detached at 0.17.5rc3
I did a python setup.py develop, and ran the script:
` from clearml import Dataset
dataset = Dataset.create(dataset_project='test', dataset_name='example')
dataset.add_files('/home/tglema/example.jpeg')
dataset.add_files('/home/tglema/logo.png')
print(dataset.list_files())
dataset.upload()
dataset.finalize()
dataset_new = Dataset.create(dataset_project='test', dataset_name='example_without_logo', parent_datasets=[dataset.id] )
print(dataset_new.list_files())
print(dataset_new.remove_files('logo.png'))
print(dataset_new.list_removed_files())
dataset_new.upload()
dataset_new.finalize() `and this is the output
['example.jpeg', 'logo.png']
Uploading compressed dataset changes (2 files, total 49.07 KB) to
Upload completed (49.07 KB)
2021-02-05 17:04:38,333 - clearml.Task - INFO - Waiting to finish uploads
2021-02-05 17:04:38,334 - clearml.Task - INFO - Finished uploading
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!
['example.jpeg', 'logo.png']
0
[]
2021-02-05 17:04:41,339 - clearml - INFO - No pending files, skipping upload.
2021-02-05 17:04:41,935 - clearml.Task - INFO - Waiting to finish uploads
2021-02-05 17:04:41,935 - clearml.Task - INFO - Finished uploading
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!
(I added the print('!'*15) to check that it was using that module)
awesome! will you do the PR or should I?
Thanks MagnificentSeaurchin79 ! This code snippet is exactly what I needed, let me check if I can reproduce it.
MagnificentSeaurchin79 are you using the latest RC ?
(I think this was exactly the issue)
EDIT:
try to create the version withe the file removed after you upgrade to the latest RC (0.17.5rc3) in the summary you should see 1 file removed.
INHO, the remove_files('logo.png') shouldn't return 0..and I think the problem is that the file passed as argument is not correctly matched with the files stored in the dataset.
but I don't see any change...where is the link to the file removed from?
How are these two datasets different?
Thanks π
Oh, fork the repository (this will create a copy on your GitHub account), this is done from GitHub's web page
Then commit to your repository (on the master branch)
Then in the GitHub page of the repository on your account, you will have a green button suggesting you to PR it π
but I don't see any change...where is the link to the file removed from
In the meta data section, check the artifacts "state" object
How are these two datasets different?
Like comparing two experiments :)
it's my first PR to an opensource project π
ah, I see..so I do it in master or in 0.17.5rc3?
ah, I see master is the same as 0.17.5rc3
are you planning on changing to f-strings incrementally?
There is still py 2.7 & 3.5 support...
Hopefully we will be able to drop both (apparently enough users have legacy code), then we will probably switch to the nicer f' strings π
NICE! MagnificentSeaurchin79 could you PR this fix?
so how do I make a PR? π
I don't have write access..
You are doing great π don't worry about it
if I list_files the new dataset, I see the same files π
Thanks!
I think this one will cover both case (the issue is with files on the root of the dataset)if not (fnmatch(k, path) and fnmatch(k if '/' in k else '/{}'.format(k), '*/' + wildcard))}
Hi MagnificentSeaurchin79
Yes this is a bit confusing π
Datasets are stored as delta changes from parent versions.
A dataset contains a list of files and list of artifacts where these files exist. This means that if we add a new file to a dataset we create a new dataset from a parent dataset and want to add a file, we have to add a link to the file, and have a new artifact containing just the delta (i.e. the new file) from the parent version When you delete a file you just remove the link to the file (no need to change the parent zip)
Make sense ?
btw: Make sure you are using the latest RC (I think we fixed a bug in some edge case I can't remeber regrading Datasets)
are you planning on changing to f-strings incrementally?
if I change line 217 in dataset.py
:fnmatch(k, '*/'+ wildcard)
forfnmatch(k, wildcard)
then it works :-)
I don't understand what I'm doing wrong..how are you supposed to call the remove_files
function?