Hi, I Am Using Aws Autoscalaer To Train Model. I Have A Fair Large Dataset(400G) And The Data Is Private So I Can'T Really Store It In Clearml Dataset. Everytime When I Launch A Job, It'S Going To Take Very Long Time To Download The Data From S3. Is There

Answered

Hi, I am using AWS autoscalaer to train model. I have a fair large dataset(400G) and the data is private so I can't really store it in ClearML dataset. Everytime when I launch a job, it's going to take very long time to download the data from S3. Is there a way I can specify a permanent AWS EBS to store the data and reuse it every time when I launch a job?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					EnchantingPenguin77
				
					0
					 × 1

Votes Newest

Answers

Hi @<1597762318140182528:profile|EnchantingPenguin77> , in theory you could use EFS, but as far as I know the performance will not be very good, and using an AWS-native object storage (i.e. S3) in the same region should provide better performance.
An EBS volume can only be mapped to a single machine at a time, so when using the autoscaler, this can't be done (since a single volume can't be used for all machines the autoscaler spins)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

2K Views

1 Answer

2 years ago