Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi I Am Creating A Custom Image Dataset And Does Anyone Know How To Upload Files Into Remote Uri (S3) Without Compressing The Data?

Hi
I am creating a custom image dataset and does anyone know how to upload files into remote uri (S3) without compressing the data?

from clearml import Dataset
dataset = Dataset.create(
    dataset_name="sample",
    dataset_project="test",
    output_uri="
",
    description="sample testing dataset",
)

dataset.add_files(
    path="sample_dataset",
    wildcard="*.jpg",
    recursive=True,
)

dataset.upload(
    show_progress=True,
    verbose=True,
    compression=None,
    retries=3,
)

Even though I passed compression=None the files in the S3 bucket are still compressed in zip format.

  
  
Posted 11 months ago
Votes Newest

Answers 5


I think based on the src file dataset.py the compression False is not possible

def upload(
        self,
        show_progress=True,
        verbose=False,
        output_url=None,
        compression=None,
        chunk_size=None,
        max_workers=None,
        retries=3,
    ):
        # type: (bool, bool, Optional[str], Optional[str], int, Optional[int], int) -> ()
        """
        Start file uploading, the function returns when all files are uploaded.

        :param show_progress: If True, show upload progress bar
        :param verbose: If True, print verbose progress report
        :param output_url: Target storage for the compressed dataset (default: file server)
            Examples: `
`, `
` , `
` , `/mnt/share/data`
        :param compression: Compression algorithm for the Zipped dataset file (default: ZIP_DEFLATED)
        :param chunk_size: Artifact chunk size (MB) for the compressed dataset,
            if not provided (None) use the default chunk size (512mb).
            If -1 is provided, use a single zip artifact for the entire dataset change-set (old behaviour)
        :param max_workers: Numbers of threads to be spawned when zipping and uploading the files.
            If None (default) it will be set to:

          - 1: if the upload destination is a cloud provider ('s3', 'gs', 'azure')
          - number of logical cores: otherwise
        :param int retries: Number of retries before failing to upload each zip. If 0, the upload is not retried.

        :raise: If the upload failed (i.e. at least one zip failed to upload), raise a `ValueError`
        """
  
  
Posted 10 months ago

QuizzicalFlamingo74 did u find solution?)

  
  
Posted 10 months ago

Hi QuizzicalFlamingo74 , Try compression=False

  
  
Posted 10 months ago

MinuteMouse44 unfortunately no, I created my own upload method with python Django and passed the S3 directories to the clearML dataset to track of datasets.

  
  
Posted 10 months ago

Which is not an ideal solution but for now it's working.

  
  
Posted 10 months ago