Hey Guys! I'M Having Some Issues With Pytorch And Clearml. I Am Starting A New Task Using Task.Create And Setting Pytorch As A Requirement Under `Packages`. For Some Reason Pytorch With Cuda 12 Is Being Installed, But I Need Cuda 11. Do You Know How To Se

Answered

Hey guys! I'm having some issues with pytorch and clearml. I am starting a new task using task.create and setting pytorch as a requirement under packages. For some reason pytorch with CUDA 12 is being installed, but I need CUDA 11. Do you know how to set it to install CUDA 11?

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Votes Newest

Answers 41

This has been resolved now! Thank you for your help CostlyOstrich36

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

But the process is still hanging, and not proceeding to actually running the clearml task

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Just try as is first with this docker image + verify that the code can access cuda driver unrelated to the agent

  				
Posted 
	7 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

If I run nvidia-smi it returns valid output and it says the CUDA version is 11.2

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Thank you for getting back to me

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Thank you I will try that

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

within a docker

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

agent.cuda_version="11.2"

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

CUDA is the driver itself. The agent doesn't install CUDA but installs a compatible torch assuming that CUDA is properly installed.

  				
Posted 
	7 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 I'm now running the agent with --docker , and I'm using task.create(docker="nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04")

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

It seems to find a cuda 11, then it installs cuda 12


Torch CUDA 111 index page found, adding `

`
PyTorch: Adding index `

` and installing `torch ==2.4.0.*`
Looking in indexes:


Collecting torch==2.4.0.*
  Using cached torch-2.4.0-cp310-cp310-manylinux1_x86_64.whl (797.2 MB)
2024-08-12 12:40:37
Collecting clearml
  Using cached clearml-1.16.3-py2.py3-none-any.whl (1.2 MB)
Collecting triton==3.0.0
  Using cached

 (209.4 MB)
2024-08-12 12:40:42
Collecting nvidia-nccl-cu12==2.20.5
  Using cached nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)
Collecting nvidia-curand-cu12==10.3.2.106

  				
Posted 
	7 months ago

					More  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Show more results

Write your answer

47K Views

41 Answers

7 months ago