Hi! Trying To Create A Dataset (We’Re Running It On An On-Prem Clearml Server), I Run The Following Command:

Answered

Hi!
trying to create a dataset (we’re running it on an on-prem clearml server), I run the following command:
clearml-data add --files data_20210613/and get the following error response:
`
clearml-data - Dataset Management & Versioning CLI
Adding files/folder to dataset id cc5be76bf29a42f694eed5caadf7d50d
Generating SHA2 hash for 56875 files
Hash generation completed
2021-07-18 09:19:22,655 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object files.clearml.<...>/artifacts/state/state.json (413): <html>

<head><title>413 Request Entity Too Large</title></head> <body> <center><h1>413 Request Entity Too Large</h1></center> <hr><center>nginx/1.19.9</center> </body> </html>

Error: Failed uploading object files.clearml.<...>/artifacts/state/state.json (413): <html>

<head><title>413 Request Entity Too Large</title></head> <body> <center><h1>413 Request Entity Too Large</h1></center> <hr><center>nginx/1.19.9</center> </body> </html> `What do I do wrong?

Thx!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					FunnyTurkey96
				
					0
					 × 1

Votes Newest

Answers 9

It’s a big dataset/.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					FunnyTurkey96
				
					0
					 × 1

Thx!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					FunnyTurkey96
				
					0
					 × 1

Well, you'll need to configure the default output_uri to be an s3 bucket

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Well, I couldn’t find where to add the output_uri .
I tried now the following:
` import clearml
local_path='<some-local-path>'
s3_path = 's3://<some-bucket-path>'

dataset = clearml.Dataset.create(dataset_project='project_name', dataset_name='trial_01')

dataset.add_files(path=local_path, dataset_path=s3_path) `but I don’t see the files on the s3 bucket.

I also tried this:
dataset.sync_folder(local_path=local_path, dataset_path=s3_path)and still no success. It seems like it uploading the files to the clearml server:
>> dataset.get_default_storage() ' '
It would be great if you could help me understand how to direct the dataset to upload the files to the s3_path.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					FunnyTurkey96
				
					0
					 × 1

Thx! will try it tomorrow.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					FunnyTurkey96
				
					0
					 × 1

Hey, it seems the dataset is simply too large to upload to the fileserver... How big is the dataset?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

See here: https://clear.ml/docs/latest/docs/faq#git-and-storage

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Well, in your clearml.conf file, set the sdk.development.default_output_uri to the desired value (see https://github.com/allegroai/clearml/blob/fb6fd9ac4a6820b4d1d3b8d6dcc60208a45d0718/docs/clearml.conf#L163 )

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I want to upload the dataset into s3. is there a flag that tells it to do so?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					FunnyTurkey96
				
					0
					 × 1

Write your answer

1K Views

9 Answers

3 years ago

one year ago