Hey Guys! I'M Having Some Issues With Pytorch And Clearml. I Am Starting A New Task Using Task.Create And Setting Pytorch As A Requirement Under `Packages`. For Some Reason Pytorch With Cuda 12 Is Being Installed, But I Need Cuda 11. Do You Know How To Se

Answered

Hey guys! I'm having some issues with pytorch and clearml. I am starting a new task using task.create and setting pytorch as a requirement under packages. For some reason pytorch with CUDA 12 is being installed, but I need CUDA 11. Do you know how to set it to install CUDA 11?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Votes Newest

Answers 41

I can install the correct torch version with this command:
pip install --pre torchvision --force-reinstall --index-url ` None ```

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

It means that there is an issue with the drivers. I suggest trying this docker image - nvcr.io/nvidia/pytorch:23.04-py3

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

pip install --pre torchvision --force-reinstall --index-url None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

docker="nvidia/cuda:11.8.0-base-ubuntu20.04"

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

I have set agent.package_manager.pip_version="" which resolved that message

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

I think it tries to get the latest one. Are you using the agent in docker mode? you can also control this via clearml.conf with agent.cuda_version

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Thank you I will try that

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

I am trying task.create like so:

task = Task.create(
    script="test_gpu.py",
    packages=["torch"],
)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

ERROR: This container was built for NVIDIA Driver Release 530.30 or later, but
       version 460.32.03 was detected and compatibility mode is UNAVAILABLE.

       [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

In the config file it should be something like this: agent.cuda_version="11.2" I think

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.0.2
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'pip'. No files were found to uninstall.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Show more results

Write your answer

128K Views

41 Answers

one year ago