Hi I Am Creating A Custom Image Dataset And Does Anyone Know How To Upload Files Into Remote Uri (S3) Without Compressing The Data?

Answered

Hi
I am creating a custom image dataset and does anyone know how to upload files into remote uri (S3) without compressing the data?

from clearml import Dataset
dataset = Dataset.create(
    dataset_name="sample",
    dataset_project="test",
    output_uri="

",
    description="sample testing dataset",
)

dataset.add_files(
    path="sample_dataset",
    wildcard="*.jpg",
    recursive=True,
)

dataset.upload(
    show_progress=True,
    verbose=True,
    compression=None,
    retries=3,
)

Even though I passed compression=None the files in the S3 bucket are still compressed in zip format.

  				
Posted 
	11 months ago

					More  		
  Report
		
					QuizzicalFlamingo74
				
					0
					 × 1

Votes Newest

Answers 5

Hi QuizzicalFlamingo74 , Try compression=False

  				
Posted 
	11 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

QuizzicalFlamingo74 did u find solution?)

  				
Posted 
	10 months ago

					More  		
  Report
		
					MinuteMouse44
				
					0
					 × 1

Which is not an ideal solution but for now it's working.

  				
Posted 
	10 months ago

					More  		
  Report
		
					QuizzicalFlamingo74
				
					0
					 × 1

I think based on the src file dataset.py the compression False is not possible

def upload(
        self,
        show_progress=True,
        verbose=False,
        output_url=None,
        compression=None,
        chunk_size=None,
        max_workers=None,
        retries=3,
    ):
        # type: (bool, bool, Optional[str], Optional[str], int, Optional[int], int) -> ()
        """
        Start file uploading, the function returns when all files are uploaded.

        :param show_progress: If True, show upload progress bar
        :param verbose: If True, print verbose progress report
        :param output_url: Target storage for the compressed dataset (default: file server)
            Examples: `

`, `

` , `

` , `/mnt/share/data`
        :param compression: Compression algorithm for the Zipped dataset file (default: ZIP_DEFLATED)
        :param chunk_size: Artifact chunk size (MB) for the compressed dataset,
            if not provided (None) use the default chunk size (512mb).
            If -1 is provided, use a single zip artifact for the entire dataset change-set (old behaviour)
        :param max_workers: Numbers of threads to be spawned when zipping and uploading the files.
            If None (default) it will be set to:

          - 1: if the upload destination is a cloud provider ('s3', 'gs', 'azure')
          - number of logical cores: otherwise
        :param int retries: Number of retries before failing to upload each zip. If 0, the upload is not retried.

        :raise: If the upload failed (i.e. at least one zip failed to upload), raise a `ValueError`
        """

  				
Posted 
	11 months ago

					More  		
  Report
		
					QuizzicalFlamingo74
				
					0
					 × 1

MinuteMouse44 unfortunately no, I created my own upload method with python Django and passed the S3 directories to the clearML dataset to track of datasets.

  				
Posted 
	10 months ago

					More  		
  Report
		
					QuizzicalFlamingo74
				
					0
					 × 1

Write your answer

1K Views

5 Answers

11 months ago

10 months ago