Hello! Since Today I Get

Answered

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Votes Newest

Answers 161

Where again does clearml place the venv?

Usually ~/.clearml/venvs-builds/<python version>/
Multiple agents will be venvs-builds.1 and so on

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Perfect! I have to thank you for helping me, not the other way around!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Let me check

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Uninstall the current clearml-agent and reinstall this wheel, I hacked it to have ==, let's see if that works

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Wtf? can you try with = (notice single not double)?

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I will try again tomorrow. It s getting late! Thank you for helping so far!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Okay. And 110 means 11.1 and not 11.0?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I just wanna add: I can run this task on the same workstation with the same conda installation just fine.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

But here is the funny thing:

channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0

Installs GPU

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Is ther a way to see the contents of /tmp/conda_envaz1ne897.yml ? Seems to be deleted after the task is finihsed

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Could you send the full log please ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

What's the difference between the two env files?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Installed miniconda finally, now trying to run the task

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Like this?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Same error 😕

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I will try a minimal version now

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I tried to run the task with detect_with_conda_freeze: false instead of true and got

Executing Conda: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch 'pip<20.2' --quiet --json
Pass
Conda: Trying to install requirements:
['pytorch~=1.8.0']
Executing Conda: /home/tim/miniconda3/condabin/conda env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envh7rq4qmc.yml --quiet --json
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):

  - pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']

The following specifications were found to be incompatible with each other:



Package cudatoolkit conflicts for:
cudatoolkit=11.0
Conda: Installing requirements: step 2 - using pip:
['clearml==0.17.4', 'tensorboard==2.4.1', 'pytorch~=1.8.0']
Collecting tensorboard==2.4.1
  Using cached tensorboard-2.4.1-py3-none-any.whl (10.6 MB)
ERROR: Could not find a version that satisfies the requirement pytorch~=1.8.0 (from -r /tmp/cached-reqsubuv0zrf.txt (line 3)) (from versions: 0.1.2, 1.0.2)
ERROR: No matching distribution found for pytorch~=1.8.0 (from -r /tmp/cached-reqsubuv0zrf.txt (line 3))
Command 'source /home/tim/miniconda3/etc/profile.d/conda.sh && conda activate /home/tim/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqsubuv0zrf.txt' returned non-zero exit status 1.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

And then?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Okay. And

110

means 11.1 and not 11.0? (edited)

110 means 11.0, the odd thing is, it actually installed 11.1, and from the pytorch website this is exactly how they suggest to install with conda...
Let me know if forcing the CUDA version changes anything

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

name: core
channels:
  - pytorch
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.10.14
  - certifi=2020.6.20
  - cloudpickle=1.6.0
  - cudatoolkit=11.1.1
  - cycler=0.10.0
  - cytoolz=0.11.0
  - dask-core=2021.2.0
  - decorator=4.4.2
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - imageio=2.9.0
  - jpeg=9b
  - kiwisolver=1.3.1
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libgfortran-ng=7.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - matplotlib-base=3.3.4
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - networkx=2.5
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pip=21.0.1
  - pyparsing=2.4.7
  - python=3.7.10
  - python-dateutil=2.8.1
  - python_abi=3.7
  - pytorch=1.8.0
  - pywavelets=1.1.1
  - readline=8.1
  - scikit-image=0.17.2
  - scipy=1.6.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tifffile=2020.10.1
  - tk=8.6.10
  - toolz=0.11.1
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - tornado=6.1
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - yaml=0.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - aiostream==0.4.2
    - attrs==20.3.0
    - clearml==0.17.4
    - dm-control==0.0.355168290
    - dm-env==1.4
    - furl==2.1.0
    - future==0.18.2
    - glfw==2.1.0
    - gym==0.18.0
    - humanfriendly==9.1
    - imageio-ffmpeg==0.4.3
    - jsonschema==3.2.0
    - labmaze==1.0.3
    - lxml==4.6.2
    - moviepy==1.0.3
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - pillow==7.2.0
    - proglog==0.1.9
    - psutil==5.8.0
    - pybullet==3.0.9
    - pygame==2.0.1
    - pyglet==1.5.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - requests-file==1.5.1
    - tensorboard==2.4.1
    - tensorboardx==2.1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Sure, I ll try this

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

This my environment installed from env file. Training works just fine here:

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Okay this is very close to what the agent is building:
Could you start a new conda env,
then install cudatoolkit=11.1
then run:

conda env update -p <conda_env_path_here> --file the_env_yaml.yml

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Could you send the end file?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Version 0.17.2 it says

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

The task already contains this

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yes I think the difference is running conda install with arguments vs conda install with env file...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It is now looking for conflicts.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yes, that is what I pasted here.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Show more results

Write your answer

18K Views

161 Answers

3 years ago

7 months ago