Issue #337 opened in the clearml repository.
Maybe it's the Azure upload that has a weird size bug?!
If my memory serves me correctly, I think it happened on weights saving as well, let me just check an experiment log and see.
Just ran a model which pulled the dataset from the Azure Blob Storage and that seemed to looked correct.
2021-06-04 13:34:21,708 - clearml.storage - INFO - Downloading: 13.00MB / 550.10MB @ 32.59MBs from
Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,754 - clearml.storage - INFO - Downloading: 21.00MB / 550.10MB @ 175.54MBs from
Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,791 - clearml.storage - INFO - Downloading: 29.00MB / 550.10MB @ 218.32MBs from
Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,819 - clearml.storage - INFO - Downloading: 37.00MB / 550.10MB @ 282.70MBs from
Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,843 - clearml.storage - INFO - Downloading: 45.00MB / 550.10MB @ 334.24MBs from
Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip
Hmmmm, I thought it logged it with the terminal results when it was uploading weights, but perhaps that's only the live version and the saved version is pruned? Or my memory is wrong.... it is Friday after all!
Can't find anymore reference to it, sorry.
AgitatedDove14
Just compared two uploads of the same dataset, one to Azure Blob and the other to local storage on clearml-server.
The local storage didn't report any statistics, so it might be confined to the cloud storage method, and specifically Azure.
That is odd ...
Could you open a GitHub issue?
Is this on any upload, how do I reproduce it ?
This was the code:
` import os
import argparse
# ClearML modules
from clearml import Dataset
parser = argparse.ArgumentParser(description='CUB200 2011 ClearML data uploader - Ed Morris (c) 2021')
parser.add_argument(
'--dataset-basedir',
dest='dataset_basedir',
type=str,
help='The directory to the root of the dataset',
default='/home/edmorris/projects/image_classification/caltech_birds/data/images')
parser.add_argument(
'--clearml-project',
dest='clearml_project',
type=str,
help='The name of the clearml project that the dataset will be stored and published to.',
default='Caltech Birds/Datasets')
parser.add_argument(
'--clearml-dataset-url',
dest='clearml_dataset_url',
type=str,
help='Location of where the dataset files should be stored. Default is Azure Blob Storage. Format is ',
default='')
args = parser.parse_args()
for task_type in ['train','test']:
print('[INFO] Versioning and uploading {0} dataset for CUB200 2011'.format(task_type))
dataset = Dataset.create('cub200_2011_{0}_dataset'.format(task_type), dataset_project=args.clearml_project)
dataset.add_files(path=os.path.join(args.dataset_basedir,task_type), verbose=False)
dataset.upload(output_url=args.clearml_dataset_url)
print('[INFO] {0} Dataset finalized....'.format(task_type), end='')
dataset.finalize()
print('[INFO] {0} Dataset published....'.format(task_type), end='')
dataset.publish() `