Unanswered
Autoscaler Parallelization Issue:
I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The
erm,
this parallelization has led to the pipeline task issuing a bunch of:model_path/run_2022_07_20T22_11_15.209_0.zip , err: [Errno 28] No space left on device
and quitting on me.
my train_image_classifier_component
is programmed to save model files to a local path which is returned (and, thanks to clearml, the path's contents are zipped uploded to the files service).
I take it that these files are also brought into pipeline tasks's local disk?
Why is that? If that is indeed what's happening, it's creating a bottleneck.
I'd have expected the tasks that use these files (the evaluation components) to pull the data directly from the files service..
165 Views
0
Answers
2 years ago
one year ago