Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi I Am Creating A Custom Image Dataset And Does Anyone Know How To Upload Files Into Remote Uri (S3) Without Compressing The Data?

Hi
I am creating a custom image dataset and does anyone know how to upload files into remote uri (S3) without compressing the data?

from clearml import Dataset
dataset = Dataset.create(
    dataset_name="sample",
    dataset_project="test",
    output_uri="
",
    description="sample testing dataset",
)

dataset.add_files(
    path="sample_dataset",
    wildcard="*.jpg",
    recursive=True,
)

dataset.upload(
    show_progress=True,
    verbose=True,
    compression=None,
    retries=3,
)

Even though I passed compression=None the files in the S3 bucket are still compressed in zip format.

  
  
Posted 7 months ago
Votes Newest

Answers 5


I think based on the src file dataset.py the compression False is not possible

def upload(
        self,
        show_progress=True,
        verbose=False,
        output_url=None,
        compression=None,
        chunk_size=None,
        max_workers=None,
        retries=3,
    ):
        # type: (bool, bool, Optional[str], Optional[str], int, Optional[int], int) -> ()
        """
        Start file uploading, the function returns when all files are uploaded.

        :param show_progress: If True, show upload progress bar
        :param verbose: If True, print verbose progress report
        :param output_url: Target storage for the compressed dataset (default: file server)
            Examples: `
`, `
` , `
` , `/mnt/share/data`
        :param compression: Compression algorithm for the Zipped dataset file (default: ZIP_DEFLATED)
        :param chunk_size: Artifact chunk size (MB) for the compressed dataset,
            if not provided (None) use the default chunk size (512mb).
            If -1 is provided, use a single zip artifact for the entire dataset change-set (old behaviour)
        :param max_workers: Numbers of threads to be spawned when zipping and uploading the files.
            If None (default) it will be set to:

          - 1: if the upload destination is a cloud provider ('s3', 'gs', 'azure')
          - number of logical cores: otherwise
        :param int retries: Number of retries before failing to upload each zip. If 0, the upload is not retried.

        :raise: If the upload failed (i.e. at least one zip failed to upload), raise a `ValueError`
        """
  
  
Posted 7 months ago

Hi @<1698868530394435584:profile|QuizzicalFlamingo74> , Try compression=False

  
  
Posted 7 months ago

@<1706116294329241600:profile|MinuteMouse44> unfortunately no, I created my own upload method with python Django and passed the S3 directories to the clearML dataset to track of datasets.

  
  
Posted 6 months ago

Which is not an ideal solution but for now it's working.

  
  
Posted 6 months ago

@<1698868530394435584:profile|QuizzicalFlamingo74> did u find solution?)

  
  
Posted 6 months ago