Hey Guys! I'M Having Some Issues With Pytorch And Clearml. I Am Starting A New Task Using Task.Create And Setting Pytorch As A Requirement Under `Packages`. For Some Reason Pytorch With Cuda 12 Is Being Installed, But I Need Cuda 11. Do You Know How To Se

Answered

Hey guys! I'm having some issues with pytorch and clearml. I am starting a new task using task.create and setting pytorch as a requirement under packages. For some reason pytorch with CUDA 12 is being installed, but I need CUDA 11. Do you know how to set it to install CUDA 11?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Votes Newest

Answers 41

I have set agent{cuda_version: 11.2}

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

I suggest running it in docker mode with a docker image that already has cuda installed

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

within a docker

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Hi @<1734020162731905024:profile|RattyBluewhale45> , what version of pytorch are you specifying?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Thank you I will try that

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Isn't the problem that CUDA 12 is being installed?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

I think it tries to get the latest one. Are you using the agent in docker mode? you can also control this via clearml.conf with agent.cuda_version

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Just try as is first with this docker image + verify that the code can access cuda driver unrelated to the agent

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

In the config file it should be something like this: agent.cuda_version="11.2" I think

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Solved that by setting docker_args=["--privileged", "--network=host"]

  				
Posted 
	one year ago

					More
				  		
  Report
		
					RattyBluewhale45
				
					0
					 × 1

Just to make sure, run the code on the machine itself to verify that python can actually detect the driver

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Show more results

Write your answer

131K Views

41 Answers

one year ago