Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


Do you know how I can make sure I do not have CUDA or a broken installation installed?

I don't think this is the case, it is quite specifically installing the CPU version.
BTW: after the agent fails it will not remove the venv, so you can get into it and check, from the log it will be in: /home/tim/.clearml/venvs-builds/3.7

  
  
Posted 4 years ago

It asks the driver or find the cuda dll/so

  
  
Posted 4 years ago

Okay. And 

110

 means 11.1 and not 11.0? (edited)

110 means 11.0, the odd thing is, it actually installed 11.1, and from the pytorch website this is exactly how they suggest to install with conda...
Let me know if forcing the CUDA version changes anything

  
  
Posted 4 years ago

Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed

  
  
Posted 4 years ago

ca-certificates           2021.1.19            h06a4308_1  
certifi                   2020.12.5        py38h06a4308_0  
cudatoolkit               11.0.221             h6bb024c_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses                   6.2                  he6710b0_1  
openssl                   1.1.1j               h27cfd23_0  
pip                       20.0.2                   py38_1    conda-forge
python                    3.8.8                hdb3f193_4  
readline                  8.1                  h27cfd23_0  
setuptools                52.0.0           py38h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
tk                        8.6.10               hbc83047_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
  
  
Posted 4 years ago

Thanks!

  
  
Posted 4 years ago

Could you send the full log please ?

  
  
Posted 4 years ago

Could you test with 4.7.5 ?

  
  
Posted 4 years ago

conda 4.9.2

  
  
Posted 4 years ago

Same error.

  
  
Posted 4 years ago

Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked

/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
- jpeg~=9b.0
- kiwisolver~=1.3.1
- lame~=3.100
- lcms2~=2.11
- ld_impl_linux-64~=2.33.1
- libedit~=3.1.20191231
- libffi~=3.3
- libgcc-ng~=9.3.0
- libgfortran-ng~=7.3.0
- libiconv~=1.16
- libpng~=1.6.37
- libstdcxx-ng~=9.3.0
- libtiff~=4.1.0
- libuv~=1.41.0
- llvm-openmp~=11.0.1
- lz4-c~=1.9.3
- matplotlib-base~=3.3.4
- mkl~=2020.4
- mkl-service~=2.3.0
- mkl_fft~=1.3.0
- mkl_random~=1.2.0
- ncurses~=6.2
- nettle~=3.6
- networkx~=2.5
- ninja~=1.10.2
- numpy~=1.19.2
- numpy-base~=1.19.2
- olefile~=0.46
- openh264~=2.1.1
- openssl~=1.1.1j
- pyparsing~=2.4.7
- python~=3.7.10
- python-dateutil~=2.8.1
- python_abi~=3.7
- pytorch~=1.8.0
- pywavelets~=1.1.1
- pyyaml~=5.3.1
- readline~=8.1
- scikit-image~=0.17.2
- scipy~=1.6.1
- setuptools~=52.0.0
- six~=1.15.0
- sqlite~=3.33.0
- tifffile~=2020.10.1
- tk~=8.6.10
- toolz~=0.11.1
- torchaudio~=0.8.0
- torchvision~=0.9.0
- tornado~=6.1
- typing_extensions~=3.7.4.3
- wheel~=0.36.2
- xz~=5.2.5
- yaml~=0.2.5
- zlib~=1.2.11
- zstd~=1.4.9
  
  
Posted 4 years ago

And then?

  
  
Posted 4 years ago

sure.

  
  
Posted 4 years ago

Okay this seems correct:

pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0

I can't seem to find what's the diff between the two.
Give me a second let me check if I can reproduce it somehow.

  
  
Posted 4 years ago

name: core
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.12.5
  - certifi=2020.12.5
  - cudatoolkit=11.1.1
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - jpeg=9b
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pillow=8.1.2
  - pip=21.0.1
  - python=3.8.8
  - python_abi=3.8
  - pytorch=1.8.0
  - readline=8.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tk=8.6.10
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - attrs==20.3.0
    - clearml==0.17.4
    - furl==2.1.0
    - humanfriendly==9.1
    - jsonschema==3.2.0
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - psutil==5.8.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - pyyaml==5.4.1
    - requests-file==1.5.1
  
  
Posted 4 years ago

Can you ping me when it is updated in None so I can update my installation?

  
  
Posted 4 years ago

Yes I think the difference is running conda install with arguments vs conda install with env file...

  
  
Posted 4 years ago

Will do!

  
  
Posted 4 years ago

So to further debug I need to somehow access /tmp/conda_envaz1ne897.yml

  
  
Posted 4 years ago

I tried "~=", "==" and "="

  
  
Posted 4 years ago

The task already contains this

  
  
Posted 4 years ago

Whats the conda version you are using ?

  
  
Posted 4 years ago

Now I get:

ollecting package metadata (repodata.json): done
Solving environment: - 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                         
                                                                                                                                                                               
UnsatisfiableError: The following specifications were found to be incompatible with a past                                                                                     
explicit spec that is not an explicit spec in this operation (cudatoolkit):

  - pytorch==1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']

The following specifications were found to be incompatible with each other:



Package cudatoolkit conflicts for:
cudatoolkit=11.0
  
  
Posted 4 years ago

Oh, the hacked one.

  
  
Posted 4 years ago

Like this?

  
  
Posted 4 years ago

Installed miniconda finally, now trying to run the task

  
  
Posted 4 years ago

Yeaaa I got it working!

  
  
Posted 4 years ago

Could you send the end file?

  
  
Posted 4 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98>
Could you send the full log ? Also what's the clearml-agent version?

  
  
Posted 4 years ago

Still shows CPU version when I run conda list

  
  
Posted 4 years ago
106K Views
161 Answers
4 years ago
one year ago
Tags