Hello! Is There Any Way To Use Original Files In Cleaml Datasets ? I Have Batch Of Tar Archives And Want To Create Dataset From Them, However Clearml Compresses Them. I Tried To Use

Answered

Hello! Is there any way to use original files in cleaml datasets ? I have batch of tar archives and want to create dataset from them, however clearml compresses them. I tried to use compression = None , but it didnt help

dataset.upload(verbose=True, output_url='

', compression = None,chunk_size = -1 )

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyBeetle18
				
					0
					 × 1

Votes Newest

Answers 4

Hi @<1523702307240284160:profile|TeenyBeetle18> , if they are already on GS then you can use add_external_files to log them.
None
What do you think?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Why does it matter how clearml stores datasets? If you get the dataset locally, all files will be unzipped.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AstonishingOx62
				
					0
					 × 1

Seems like it does not let to use ability of clearml to track and version datasets. I mean, I can't create next version of dataset from dataset with external files

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyBeetle18
				
					0
					 × 1

Why does it matter how clearml stores datasets? If you get the dataset locally, all files will be unzipped.

It takes time to compress. 8 archives , 5gb each , takes half of hour.
I can stream archives from bucket directly to network for training without getting them locally, which saves storage space

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyBeetle18
				
					0
					 × 1

Write your answer

1K Views

4 Answers

one year ago