Hello! Since Today I Get

Answered

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Votes Newest

Answers 161

Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with

pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl

Then run the daemon with the additional --debug argument, basically:

clearml-agent --debug daemon --foreground ...

Once the agent is running please send the Task's log from your console 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Thanks! Tomorrow is great, I'll put the wheel here 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ok, thanks!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Hmm maybe this is the issue, :

Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):

  - pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']

This makes no sense, conda is saying pytorch=1.8 needs cudatoolkit <10.2/10.3 but actually it needs cudatoolkit 11.1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Same error.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I just started a task from this environment and it fails on the agent.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Still shows CPU version when I run conda list

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

And the one with the CPU version? is it with "~=" or "="?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

sure

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit==11.1.1
- pytorch==1.8.0

Gives CPU version

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Or there should be an early error for trying to run conda based tasks on pip agents

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Thank you! 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

From the logs when ran with --foreground I I do not see any conda create command.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

How does clearml-agent create the conda environment?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yep, this install PyTorch CPU

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Complete conda log

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Can you ping me when it is updated in None so I can update my installation?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

WTF?!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ha?!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Perfect! 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

You suggested this fix earlier, but I am not sure why it didnt work then.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yeaaa I got it working!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is the file which installs the GPU version

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

@<1523701868901961728:profile|ReassuredTiger98> thank you so much for testing it!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Show more results

Write your answer

18K Views

161 Answers

3 years ago

7 months ago