Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

Is there a way to control how many parallel connections are used when downloading from azure.storage ? It seems clearml is using a single connection, that takes a long time download

  
  
Posted 3 years ago
Votes Newest

Answers 8


well that's much faster, from 1mb/s to 60mb/s 🙂

  
  
Posted 3 years ago

AgitatedDove14 that looks good, i'd like to request an addition to control the other max_connections i see on that file, as i also noticed that uploads are sometimes slow, and i see here max_connections=2

  
  
Posted 3 years ago

okay let's PR this fix ?

  
  
Posted 3 years ago

in clearml.conf we could have:
azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }Then in AzureContainerConfigurations :
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
super(AzureContainerConfigurations, self).init()
self._container_configs = container_configs or []
self.max_connections = max_connections

@classmethod
def from_config(cls, configuration):
    default_account = getenv("AZURE_STORAGE_ACCOUNT")
    default_key = getenv("AZURE_STORAGE_KEY")

    default_container_configs = []
    if default_account and default_key:
        default_container_configs.append(AzureContainerConfig(
            account_name=default_account, account_key=default_key
        ))

    max_connections = configuration.get("max_connections", 10)
    if configuration is None:
        return cls(default_container_configs, max_connections)

    containers = configuration.get("containers", list())
    container_configs = [AzureContainerConfig(**entry) for entry in containers] + default_container_configs

    return cls(container_configs, max_connections) `And finally:

in _AzureBlobServiceStorageDriver.download_object(...)
_ = container.blob_service.get_blob_to_path( container.name, obj.blob_name, local_path, max_connections=container.max_connections or 10, progress_callback=callback_func, )ShakyJellyfish91 wdyt?

  
  
Posted 3 years ago

as i also noticed that uploads are sometimes slow, and i see here max_connections=2

Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object and an additional variable on the AzureContainerConfigurations class.
Could you PR a tested draft ? we will be able to take from there

  
  
Posted 3 years ago

Hi ShakyJellyfish91

It seems clearml is using a single connection, that takes a long time download

Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763

Does max_connections=10 mean 10 concurrent connections ?

  
  
Posted 3 years ago

Ohh wow

  
  
Posted 3 years ago

hm, maybe, i will try to override this to see what happens, thanks!

  
  
Posted 3 years ago
891 Views
8 Answers
3 years ago
one year ago
Tags