Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> if you use the latest RC! i sent and run with --debug in the log you will see the full /tmp/conda_envaz1ne897.yml content
Here it is copied from your log, do you want to see if this one works:

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
- jpeg~=9b.0
- kiwisolver~=1.3.1
- lame~=3.100
- lcms2~=2.11
- ld_impl_linux-64~=2.33.1
- libedit~=3.1.20191231
- libffi~=3.3
- libgcc-ng~=9.3.0
- libgfortran-ng~=7.3.0
- libiconv~=1.16
- libpng~=1.6.37
- libstdcxx-ng~=9.3.0
- libtiff~=4.1.0
- libuv~=1.41.0
- llvm-openmp~=11.0.1
- lz4-c~=1.9.3
- matplotlib-base~=3.3.4
- mkl~=2020.4
- mkl-service~=2.3.0
- mkl_fft~=1.3.0
- mkl_random~=1.2.0
- ncurses~=6.2
- nettle~=3.6
- networkx~=2.5
- ninja~=1.10.2
- numpy~=1.19.2
- numpy-base~=1.19.2
- olefile~=0.46
- openh264~=2.1.1
- openssl~=1.1.1j
- pyparsing~=2.4.7
- python~=3.7.10
- python-dateutil~=2.8.1
- python_abi~=3.7
- pytorch~=1.8.0
- pywavelets~=1.1.1
- pyyaml~=5.3.1
- readline~=8.1
- scikit-image~=0.17.2
- scipy~=1.6.1
- setuptools~=52.0.0
- six~=1.15.0
- sqlite~=3.33.0
- tifffile~=2020.10.1
- tk~=8.6.10
- toolz~=0.11.1
- torchaudio~=0.8.0
- torchvision~=0.9.0
- tornado~=6.1
- typing_extensions~=3.7.4.3
- wheel~=0.36.2
- xz~=5.2.5
- yaml~=0.2.5
- zlib~=1.2.11
- zstd~=1.4.9
  
  
Posted 4 years ago

okay, I'll make sure we order it correctly

  
  
Posted 4 years ago

fyi: NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2

  
  
Posted 4 years ago

Sure, let's do that 🙂

  
  
Posted 4 years ago

Give me a minute

  
  
Posted 4 years ago

Ha?!

  
  
Posted 4 years ago

Hmm maybe this is the issue, :

Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):

  - pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']

This makes no sense, conda is saying pytorch=1.8 needs cudatoolkit <10.2/10.3 but actually it needs cudatoolkit 11.1

  
  
Posted 4 years ago

Yes that is exactly what I will make sure we change :)

  
  
Posted 4 years ago

And this works fine.

  
  
Posted 4 years ago

I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.

  
  
Posted 4 years ago

Can you actually reproduce my problem when also using conda_freeze: true ?

  
  
Posted 4 years ago

I get 110 but it should be 111

  
  
Posted 4 years ago

Yes, that is what I pasted here.

  
  
Posted 4 years ago

My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?

For some reason it detect CUDA 11.1 (I assume this is what you have installed, the driver CUDA version is the highest it will support not necessary what you have installed)

  
  
Posted 4 years ago

Let me check something

  
  
Posted 4 years ago

ca-certificates           2021.1.19            h06a4308_1  
certifi                   2020.12.5        py38h06a4308_0  
cudatoolkit               11.0.221             h6bb024c_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses                   6.2                  he6710b0_1  
openssl                   1.1.1j               h27cfd23_0  
pip                       20.0.2                   py38_1    conda-forge
python                    3.8.8                hdb3f193_4  
readline                  8.1                  h27cfd23_0  
setuptools                52.0.0           py38h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
tk                        8.6.10               hbc83047_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
  
  
Posted 4 years ago

Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed

  
  
Posted 4 years ago

Yes I think the difference is running conda install with arguments vs conda install with env file...

  
  
Posted 4 years ago

name: core
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.12.5
  - certifi=2020.12.5
  - cudatoolkit=11.1.1
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - jpeg=9b
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pillow=8.1.2
  - pip=21.0.1
  - python=3.8.8
  - python_abi=3.8
  - pytorch=1.8.0
  - readline=8.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tk=8.6.10
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - attrs==20.3.0
    - clearml==0.17.4
    - furl==2.1.0
    - humanfriendly==9.1
    - jsonschema==3.2.0
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - psutil==5.8.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - pyyaml==5.4.1
    - requests-file==1.5.1
  
  
Posted 4 years ago

It asks the driver or find the cuda dll/so

  
  
Posted 4 years ago

No worries, gnight :)

  
  
Posted 4 years ago

This is the file which installs the GPU version

  
  
Posted 4 years ago

Do you know how I can make sure I do not have CUDA or a broken installation installed?

I don't think this is the case, it is quite specifically installing the CPU version.
BTW: after the agent fails it will not remove the venv, so you can get into it and check, from the log it will be in: /home/tim/.clearml/venvs-builds/3.7

  
  
Posted 4 years ago

It is now looking for conflicts.

  
  
Posted 4 years ago

conda 4.9.2

  
  
Posted 4 years ago

Same error.

  
  
Posted 4 years ago

No problem! I profit so much from clearml 🙂

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 4 years ago

Complete conda log

  
  
Posted 4 years ago
107K Views
161 Answers
4 years ago
one year ago
Tags