Reputation
Badges 1
4 × Eureka!Thank you @<1523701087100473344:profile|SuccessfulKoala55> for the ideas. AMI is - ami-0025f0db847eb6254
, it may doesn't have docker, yeah - I will check out that.
My main assumption is that AWS Autoscaler can't establish SSH connection with a EC2 instance. Because the EC2 instance is created without public IP address (not sure if it's the clue).
The issue was in my Terraform VPC configuration: I missed the enable_nat_gateway =
true
. Thus EC2 instance was not able to even update OS packages. The "Instance log files" on the AWS Autoscaler page pointed me to that issue.
SSH connection is not required, yep.
Seems, I found the issue. On macbook I got torch==2.1.0
in requirements.txt
. But on AWS P3 instance I get torch==2.1.0+cu121
after reinstallation and GPU works fine. Hope, now it will work in a docker container as well.
Thank you for the reply @<1523701070390366208:profile|CostlyOstrich36> . I will try the image.
The initial issue was next:
CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL:
Alternatively, go to:
to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions...
For anybody who will get stuck on this issue in the future as well: None
Hi, guys! I get the same error. But looks like the ultralytics
(YOLO) is dependent from the opencv-python
.
How did you resolve this dependency? In my case I get ModuleNotFoundError: No module named 'cv2'
after a simple replacement of the opencv-python
by the opencv-python-headless
.