ClearML FAQ | Hello Everyone! I'M Using Clearml, Yolo V8 And Clearml Gpu Compute (For Orchestration). The Issue Is That I Can'T Find A Compatibility To Use Gpu During Yolo Training (On A Remote Instance, Not Local). My Local Machine Is Macbook On M2 Chip

Answered

Hello Everyone! I'M Using Clearml, Yolo V8 And Clearml Gpu Compute (For Orchestration). The Issue Is That I Can'T Find A Compatibility To Use Gpu During Yolo Training (On A Remote Instance, Not Local). My Local Machine Is Macbook On M2 Chip - Maybe It'S

Hello everyone!

I'm using ClearML, YOLO v8 and ClearML GPU Compute (for orchestration).
The issue is that I can't find a compatibility to use GPU during YOLO training (on a remote instance, not local). My local machine is Macbook on M2 chip - maybe it's the main reason 🙂 Can anybody share the working configuration? I'm interesting in the docker image tag for agent, the versions of pip packages for the ultralytics and torch .

  				
Posted 
	one year ago

					More  		
  Report
		
					HurtRaccoon43
				
					0
					 × 1

Votes Newest

Answers 4

Thank you for the reply CostlyOstrich36 . I will try the image.

The initial issue was next:

CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL:

 Alternatively, go to:

 to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)

I identified that torch==2.1.0 is not compatible with nvidia/cuda:11.4.3-cudnn8-runtime-ubuntu20.04 image - it's default image provided by ClearML GPU Compute.

After that I tried the nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04 and got next error:

UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)

Googling the problem I found that it's usual pain to find compatible versions of cuda, pytorch and gpu. So, I need some advice how to resolve this compatibility issue to be able to use GPU power.

  				
Posted 
	one year ago

					More  		
  Report
		
					HurtRaccoon43
				
					0
					 × 1

Seems, I found the issue. On macbook I got torch==2.1.0 in requirements.txt . But on AWS P3 instance I get torch==2.1.0+cu121 after reinstallation and GPU works fine. Hope, now it will work in a docker container as well.

  				
Posted 
	one year ago

					More  		
  Report
		
					HurtRaccoon43
				
					0
					 × 1

What specific compatibility issues are you getting?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Hi HurtRaccoon43 , I'd suggest trying this docker image: nvcr.io/nvidia/pytorch:23.03-py3

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

1K Views

4 Answers

one year ago