This Wasn'T A Big Deal, But I Noticed When Pushing A Dataset To The Server, With Cloud Storage, That The Upload Information Looked A Bit Bonkers In Terms Of Units:

Answered

This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of units:

2021-06-04 13:05:46,366 - clearml.storage - INFO - Uploading: 36180.00MB / 550.10MB @ 6140.80MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,440 - clearml.storage - INFO - Uploading: 36720.00MB / 550.10MB @ 7300.81MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,486 - clearml.storage - INFO - Uploading: 37264.00MB / 550.10MB @ 11711.66MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,537 - clearml.storage - INFO - Uploading: 37812.00MB / 550.10MB @ 10796.55MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,582 - clearml.storage - INFO - Uploading: 38362.10MB / 550.10MB @ 12276.23MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

Votes Newest

Answers 10

No worries, I'll see what I can do 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Maybe it's the Azure upload that has a weird size bug?!

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmmmm, I thought it logged it with the terminal results when it was uploading weights, but perhaps that's only the live version and the saved version is pruned? Or my memory is wrong.... it is Friday after all!
Can't find anymore reference to it, sorry.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

Issue #337 opened in the clearml repository.

https://github.com/allegroai/clearml/issues/377

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

This was the code:

` import os
import argparse

# ClearML modules
from clearml import Dataset

parser = argparse.ArgumentParser(description='CUB200 2011 ClearML data uploader - Ed Morris (c) 2021')
parser.add_argument(
    '--dataset-basedir',
    dest='dataset_basedir',
    type=str,
    help='The directory to the root of the dataset',
    default='/home/edmorris/projects/image_classification/caltech_birds/data/images')
parser.add_argument(
    '--clearml-project',
    dest='clearml_project',
    type=str,
    help='The name of the clearml project that the dataset will be stored and published to.',
    default='Caltech Birds/Datasets')
parser.add_argument(
    '--clearml-dataset-url',
    dest='clearml_dataset_url',
    type=str,
    help='Location of where the dataset files should be stored. Default is Azure Blob Storage. Format is ',
    default='')
args = parser.parse_args()

for task_type in ['train','test']:
    print('[INFO] Versioning and uploading {0} dataset for CUB200 2011'.format(task_type))
    dataset = Dataset.create('cub200_2011_{0}_dataset'.format(task_type), dataset_project=args.clearml_project)
    dataset.add_files(path=os.path.join(args.dataset_basedir,task_type), verbose=False)
    dataset.upload(output_url=args.clearml_dataset_url)
    print('[INFO] {0} Dataset finalized....'.format(task_type), end='')
    dataset.finalize()

print('[INFO] {0} Dataset published....'.format(task_type), end='')
dataset.publish() `

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

That is odd ...
Could you open a GitHub issue?
Is this on any upload, how do I reproduce it ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Just ran a model which pulled the dataset from the Azure Blob Storage and that seemed to looked correct.

2021-06-04 13:34:21,708 - clearml.storage - INFO - Downloading: 13.00MB / 550.10MB @ 32.59MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,754 - clearml.storage - INFO - Downloading: 21.00MB / 550.10MB @ 175.54MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,791 - clearml.storage - INFO - Downloading: 29.00MB / 550.10MB @ 218.32MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,819 - clearml.storage - INFO - Downloading: 37.00MB / 550.10MB @ 282.70MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,843 - clearml.storage - INFO - Downloading: 45.00MB / 550.10MB @ 334.24MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

If my memory serves me correctly, I think it happened on weights saving as well, let me just check an experiment log and see.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

AgitatedDove14
Just compared two uploads of the same dataset, one to Azure Blob and the other to local storage on clearml-server.
The local storage didn't report any statistics, so it might be confined to the cloud storage method, and specifically Azure.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

👍

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

10 Answers

4 years ago

2 years ago