Hello! Since Today I Get

Answered

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Votes Newest

Answers 161

name: core
channels:
  - pytorch
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.10.14
  - certifi=2020.6.20
  - cloudpickle=1.6.0
  - cudatoolkit=11.1.1
  - cycler=0.10.0
  - cytoolz=0.11.0
  - dask-core=2021.2.0
  - decorator=4.4.2
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - imageio=2.9.0
  - jpeg=9b
  - kiwisolver=1.3.1
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libgfortran-ng=7.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - matplotlib-base=3.3.4
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - networkx=2.5
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pip=21.0.1
  - pyparsing=2.4.7
  - python=3.7.10
  - python-dateutil=2.8.1
  - python_abi=3.7
  - pytorch=1.8.0
  - pywavelets=1.1.1
  - readline=8.1
  - scikit-image=0.17.2
  - scipy=1.6.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tifffile=2020.10.1
  - tk=8.6.10
  - toolz=0.11.1
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - tornado=6.1
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - yaml=0.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - aiostream==0.4.2
    - attrs==20.3.0
    - clearml==0.17.4
    - dm-control==0.0.355168290
    - dm-env==1.4
    - furl==2.1.0
    - future==0.18.2
    - glfw==2.1.0
    - gym==0.18.0
    - humanfriendly==9.1
    - imageio-ffmpeg==0.4.3
    - jsonschema==3.2.0
    - labmaze==1.0.3
    - lxml==4.6.2
    - moviepy==1.0.3
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - pillow==7.2.0
    - proglog==0.1.9
    - psutil==5.8.0
    - pybullet==3.0.9
    - pygame==2.0.1
    - pyglet==1.5.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - requests-file==1.5.1
    - tensorboard==2.4.1
    - tensorboardx==2.1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Sure, I ll try this

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

No problem! I profit so much from clearml 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

This is the venc

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

This my environment installed from env file. Training works just fine here:

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Interesting: This command failes (with an error similar to the one I posted above) in conda version 4.7.12 but runs just fine in version 4.9.2: conda create --name test-pytorch python=3.8 cudatoolkit=11.1 -c conda-forge

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Okay this is very close to what the agent is building:
Could you start a new conda env,
then install cudatoolkit=11.1
then run:

conda env update -p <conda_env_path_here> --file the_env_yaml.yml

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Could you send the end file?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Version 0.17.2 it says

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

The task already contains this

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yes that is exactly what I will make sure we change :)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes I think the difference is running conda install with arguments vs conda install with env file...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It is now looking for conflicts.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yes, that is what I pasted here.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

@<1523701868901961728:profile|ReassuredTiger98> in the UI can you see it in the "installed packages" section under the Execution Tab ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So I just updated the env that clearml-agent created (and where pytorch cpu is installed) with my local environment.yml and now the correct version is installed, so most probably the `/tmp/conda_envaz1ne897.yml`` is the problem here

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

So to further debug I need to somehow access /tmp/conda_envaz1ne897.yml

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

And how is

Summary - installed python packages: 
conda:
....

generated?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Sorry, env file for conda, the one you are using to install

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Just tried: also works with 0.17.2

Great!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Just tried: also works with 0.17.2

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

And this works fine.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Oh, the hacked one.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Type "help", "copyright", "credits" or "license" for more information.
>>> from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
>>> get_driver_cuda_version()
'110'

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

What do you mean?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

send me the conda freeze:

# Name                    Version                   Build  Channel
...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Tried to install cudatoolkit==11.1 manually in this environemnt and got:

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                         

UnsatisfiableError: The following specifications were found to be incompatible with each other:



Package xz conflicts for:
python=3.8 -> xz[version='>=5.2.4,<5.3.0a0|>=5.2.4,<6.0a0|>=5.2.5,<5.3.0a0|>=5.2.5,<6.0a0']
Package libstdcxx-ng conflicts for:
python=3.8 -> libstdcxx-ng[version='>=7.3.0|>=7.5.0|>=9.3.0']
cudatoolkit=11.1 -> libstdcxx-ng[version='>=9.3.0']
Package libgcc-ng conflicts for:
cudatoolkit=11.1 -> libgcc-ng[version='>=9.3.0']
python=3.8 -> libgcc-ng[version='>=7.3.0|>=7.5.0|>=9.3.0']
Package __glibc conflicts for:
cudatoolkit=11.1 -> __glibc[version='>=2.17,<3.0.a0']
Package libffi conflicts for:
python=3.8 -> libffi[version='>=3.2.1,<3.3.0a0|>=3.2.1,<3.3a0|>=3.3,<3.4.0a0']
Package ncurses conflicts for:
python=3.8 -> ncurses[version='>=6.1,<6.3.0a0|>=6.1,<7.0a0|>=6.2,<6.3.0a0|>=6.2,<7.0a0']
Package zlib conflicts for:
python=3.8 -> zlib[version='>=1.2.11,<1.3.0a0']
Package python_abi conflicts for:
python=3.8 -> python_abi[version='*|3.8.*',build=*_cp38]
Package sqlite conflicts for:
python=3.8 -> sqlite[version='>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0|>=3.32.3,<4.0a0|>=3.33.0,<4.0a0|>=3.34.0,<4.0a0']
Package bzip2 conflicts for:
python=3.8 -> bzip2[version='>=1.0.8,<2.0a0']
Package readline conflicts for:
python=3.8 -> readline[version='>=7.0,<8.0a0|>=8.0,<9.0a0']
Package openssl conflicts for:
python=3.8 -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a|>=1.1.1h,<1.1.2a|>=1.1.1i,<1.1.2a|>=1.1.1j,<1.1.2a']
Package tk conflicts for:
python=3.8 -> tk[version='>=8.6.10,<8.7.0a0|>=8.6.8,<8.7.0a0|>=8.6.9,<8.7.0a0']
Package pip conflicts for:
python=3.8 -> pip
Package ld_impl_linux-64 conflicts for:
python=3.8 -> ld_impl_linux-64[version='>=2.34']The following specifications were found to be incompatible with your CUDA driver:

  - cudatoolkit=11.1 -> __cuda[version='>=11.1']

Your installed CUDA driver is: 11.2

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

🤞

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Okay this seems correct:

pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0

I can't seem to find what's the diff between the two.
Give me a second let me check if I can reproduce it somehow.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The problem is that clearml installs cudatoolkit=11.0 but cudatoolkit=11.1 is needed. By setting agent.cuda_version=11.1 in clearml.conf it uses the correct version and installs fine. With version 11.0 conda will resolve conflicts by installing pytorch cpu-version.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Show more results

Write your answer

18K Views

161 Answers

3 years ago

7 months ago