Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

Unanswered

erm,
this parallelization has led to the pipeline task issuing a bunch of:
model_path/run_2022_07_20T22_11_15.209_0.zip , err: [Errno 28] No space left on deviceand quitting on me.
my train_image_classifier_component is programmed to save model files to a local path which is returned (and, thanks to clearml, the path's contents are zipped uploded to the files service).

I take it that these files are also brought into pipeline tasks's local disk?
Why is that? If that is indeed what's happening, it's creating a bottleneck.

I'd have expected the tasks that use these files (the evaluation components) to pull the data directly from the files service..

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

176 Views

0 Answers

2 years ago

one year ago