Hi Everyone, Quick Question: When Clearml-Agent Sets Up The Virtual Environment With Pip, Is Finding The Correct Cuda Version For Pytorch Something That Pip Or That Clearml Does?

Answered

Hi everyone,
quick question: When clearml-agent sets up the virtual environment with pip, is finding the correct CUDA Version for PyTorch something that pip or that clearml does?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Votes Newest

Answers 8

I have to correct myself, I do not even have CUDA installed. Only the driver and everything CUDA-related is provided by the docker container. This works with a container that has CUDA 11.4, but now I have one with 11.6 (latest nvidia pytorch docker).

However, even after changing the clearml.conf and overriding with CUDA_VERSION, the clearml-agent prints on the docker container agent.cuda_version = 114 ! (Other changes to the clearml.conf on the agent are reflected in the docker, so only the CUDA version has an issue).

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

You're right, I forgot, ClearML-Agent also tries to match a version to something that will work on the system it's running on

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Nvm, I think its my mistake. I will investigate.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I am wondering cause when used in docker mode, the docker container may have a CUDA Version that is different from the host version. However, ClearML seems to use the host version instead of the docker container's version, which is a problem sometimes.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Hi ReassuredTiger98 ,
I think it is something that was logged during the initial run, then the clearml-agent simply recreates the environment 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi CostlyOstrich36 , thank you for answering so quick. I think that s not how it works because if this was true, one would have to always match local machine to servers. Afaik clearml finds the correct PyTorch Version, but I was not sure how (custom vs pip does it)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Tested with clearml-agent 1.0.1rc4/1.2.2 and clearml 1.3.2

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I used the wrong docker container. The docker container I used had version 11.4. Interestingly, the override from clearml.conf and CUDA_VERSION Env variable did not work there.

With the correct docker container everything works fine. Shame on me.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Write your answer

1K Views

8 Answers

2 years ago

one year ago