Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
This Wasn'T A Big Deal, But I Noticed When Pushing A Dataset To The Server, With Cloud Storage, That The Upload Information Looked A Bit Bonkers In Terms Of Units:

This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of units:

2021-06-04 13:05:46,366 - clearml.storage - INFO - Uploading: 36180.00MB / 550.10MB @ 6140.80MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,440 - clearml.storage - INFO - Uploading: 36720.00MB / 550.10MB @ 7300.81MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,486 - clearml.storage - INFO - Uploading: 37264.00MB / 550.10MB @ 11711.66MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,537 - clearml.storage - INFO - Uploading: 37812.00MB / 550.10MB @ 10796.55MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:05:46,582 - clearml.storage - INFO - Uploading: 38362.10MB / 550.10MB @ 12276.23MBs from /tmp/dataset.37a8f00931b04952a1500e3ada831022.zip

  
  
Posted 2 years ago
Votes Newest

Answers 10


Maybe it's the Azure upload that has a weird size bug?!

  
  
Posted 2 years ago

This was the code:

` import os
import argparse

# ClearML modules
from clearml import Dataset

parser = argparse.ArgumentParser(description='CUB200 2011 ClearML data uploader - Ed Morris (c) 2021')
parser.add_argument(
    '--dataset-basedir',
    dest='dataset_basedir',
    type=str,
    help='The directory to the root of the dataset', 
    default='/home/edmorris/projects/image_classification/caltech_birds/data/images')
parser.add_argument(
    '--clearml-project',
    dest='clearml_project',
    type=str,
    help='The name of the clearml project that the dataset will be stored and published to.', 
    default='Caltech Birds/Datasets')
parser.add_argument(
    '--clearml-dataset-url',
    dest='clearml_dataset_url',
    type=str,
    help='Location of where the dataset files should be stored. Default is Azure Blob Storage. Format is  ', 
    default='')
args = parser.parse_args()

for task_type in ['train','test']:
    print('[INFO] Versioning and uploading {0} dataset for CUB200 2011'.format(task_type))
    dataset = Dataset.create('cub200_2011_{0}_dataset'.format(task_type), dataset_project=args.clearml_project)
    dataset.add_files(path=os.path.join(args.dataset_basedir,task_type), verbose=False)
    dataset.upload(output_url=args.clearml_dataset_url)
    print('[INFO] {0} Dataset finalized....'.format(task_type), end='')
    dataset.finalize()

    print('[INFO] {0} Dataset published....'.format(task_type), end='')
    dataset.publish() `

  
  
Posted 2 years ago

Hmmmm, I thought it logged it with the terminal results when it was uploading weights, but perhaps that's only the live version and the saved version is pruned? Or my memory is wrong.... it is Friday after all!
Can't find anymore reference to it, sorry.

  
  
Posted 2 years ago

AgitatedDove14
Just compared two uploads of the same dataset, one to Azure Blob and the other to local storage on clearml-server.
The local storage didn't report any statistics, so it might be confined to the cloud storage method, and specifically Azure.

  
  
Posted 2 years ago

If my memory serves me correctly, I think it happened on weights saving as well, let me just check an experiment log and see.

  
  
Posted 2 years ago

That is odd ...
Could you open a GitHub issue?
Is this on any upload, how do I reproduce it ?

  
  
Posted 2 years ago

No worries, I'll see what I can do 🙂

  
  
Posted 2 years ago

Issue #337 opened in the clearml repository.

https://github.com/allegroai/clearml/issues/377

  
  
Posted 2 years ago

Just ran a model which pulled the dataset from the Azure Blob Storage and that seemed to looked correct.

2021-06-04 13:34:21,708 - clearml.storage - INFO - Downloading: 13.00MB / 550.10MB @ 32.59MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,754 - clearml.storage - INFO - Downloading: 21.00MB / 550.10MB @ 175.54MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,791 - clearml.storage - INFO - Downloading: 29.00MB / 550.10MB @ 218.32MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,819 - clearml.storage - INFO - Downloading: 37.00MB / 550.10MB @ 282.70MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,843 - clearml.storage - INFO - Downloading: 45.00MB / 550.10MB @ 334.24MBs from Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip

  
  
Posted 2 years ago

👍

  
  
Posted 2 years ago
614 Views
10 Answers
2 years ago
one year ago
Tags