Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 3 years ago
Votes Newest

Answers 161


One question: Does clearml resolve the CUDA Version from driver or conda?

  
  
Posted 3 years ago

conda 4.9.2

  
  
Posted 3 years ago

I guess that has nothing to do with the diff version, right ?

  
  
Posted 3 years ago

One more thing: The cuda_version that clearml finds automatically is wrong.

  
  
Posted 3 years ago

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=1_llvm
absl-py=0.12.0=pypi_0
aiostream=0.4.2=pypi_0
attrs=20.3.0=pypi_0
blas=1.0=mkl
bzip2=1.0.8=h7b6447c_0
ca-certificates=2020.10.14=0
cached-property=1.5.2=pypi_0
cachetools=4.2.1=pypi_0
certifi=2020.6.20=py37_0
chardet=4.0.0=pypi_0
clearml=0.17.4=pypi_0
cloudpickle=1.6.0=py_0
cudatoolkit=11.1.1=h6406543_8
cycler=0.10.0=py37_0
cytoolz=0.11.0=py37h7b6447c_0
dask-core=2021.2.0=pyhd8ed1ab_0
decorator=4.4.2=py_0
dm-control=0.0.355168290=pypi_0
dm-env=1.4=pypi_0
dm-tree=0.1.5=pypi_0
ffmpeg=4.3=hf484d3e_0
freetype=2.10.4=h5ab3b9f_0
furl=2.1.0=pypi_0
future=0.18.2=pypi_0
glfw=2.1.0=pypi_0
gmp=6.2.1=h58526e2_0
gnutls=3.6.13=h85f3911_1
google-auth=1.27.1=pypi_0
google-auth-oauthlib=0.4.3=pypi_0
grpcio=1.36.1=pypi_0
gym=0.18.0=pypi_0
h5py=3.2.1=pypi_0
humanfriendly=9.1=pypi_0
idna=2.10=pypi_0
imageio=2.9.0=py_0
imageio-ffmpeg=0.4.3=pypi_0
importlib-metadata=3.7.2=pypi_0
jpeg=9b=habf39ab_1
jsonschema=3.2.0=pypi_0
kiwisolver=1.3.1=py37h2527ec5_1
labmaze=1.0.3=pypi_0
lame=3.100=h7b6447c_0
lcms2=2.11=h396b838_0
ld_impl_linux-64=2.33.1=h53a641e_7
libedit=3.1.20191231=h14c3975_1
libffi=3.3=he6710b0_2
libgcc-ng=9.3.0=h2828fa1_18
libgfortran-ng=7.3.0=hdf63c60_0
libgomp=9.3.0=h2828fa1_18
libiconv=1.16=h516909a_0
libpng=1.6.37=hbc83047_0
libstdcxx-ng=9.3.0=h6de172a_18
libtiff=4.1.0=h2733197_1
libuv=1.41.0=h7f98852_0
llvm-openmp=11.0.1=h4bd325d_0
lxml=4.6.2=pypi_0
lz4-c=1.9.3=h9c3ff4c_0
markdown=3.3.4=pypi_0
matplotlib-base=3.3.4=py37h0c9df89_0
mkl=2020.4=h726a3e6_304
mkl-service=2.3.0=py37h8f50634_2
mkl_fft=1.3.0=py37h902c9e0_1
mkl_random=1.2.0=py37h9fdb41a_1
moviepy=1.0.3=pypi_0
ncurses=6.2=he6710b0_1
nettle=3.6=he412f7d_0
networkx=2.5=py_0
ninja=1.10.2=h4bd325d_0
numpy=1.19.2=py37h54aff64_0
numpy-base=1.19.2=py37hfa32c7d_0
oauthlib=3.1.0=pypi_0
olefile=0.46=py37_0
openh264=2.1.1=h780b84a_0
openssl=1.1.1j=h7f98852_0
orderedmultidict=1.0.1=pypi_0
pathlib2=2.3.5=pypi_0
pillow=7.2.0=pypi_0
pip=21.0.1=pyhd8ed1ab_0
proglog=0.1.9=pypi_0
protobuf=3.15.5=pypi_0
psutil=5.8.0=pypi_0
pyasn1=0.4.8=pypi_0
pyasn1-modules=0.2.8=pypi_0
pybullet=3.0.9=pypi_0
pygame=2.0.1=pypi_0
pyglet=1.5.0=pypi_0
pyjwt=2.0.1=pypi_0
pyopengl=3.1.5=pypi_0
pyparsing=2.4.7=py_0
pyrsistent=0.17.3=pypi_0
python=3.7.10=hdb3f193_0
python-dateutil=2.8.1=py_0
python_abi=3.7=1_cp37m
pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0
pywavelets=1.1.1=py37h7b6447c_2
pyyaml=5.3.1=py37h7b6447c_1
readline=8.1=h27cfd23_0
requests=2.25.1=pypi_0
requests-file=1.5.1=pypi_0
requests-oauthlib=1.3.0=pypi_0
rsa=4.7.2=pypi_0
scikit-image=0.17.2=py37hdf5156a_0
scipy=1.6.1=py37h91f5cce_0
setuptools=52.0.0=py37h06a4308_0
six=1.15.0=py_0
sqlite=3.33.0=h62c20be_0
tensorboard=2.4.1=pypi_0
tensorboard-plugin-wit=1.8.0=pypi_0
tensorboardx=2.1=pypi_0
tifffile=2020.10.1=py37hdd07704_2
tk=8.6.10=hbc83047_0
toolz=0.11.1=py_0
torchaudio=0.8.0=py37
torchvision=0.9.0=py37_cu111
tornado=6.1=py37h5e8e339_1
tqdm=4.59.0=pypi_0
typing_extensions=3.7.4.3=py_0
urllib3=1.26.3=pypi_0
werkzeug=1.0.1=pypi_0
wheel=0.36.2=pyhd3deb0d_0
xz=5.2.5=h7b6447c_0
yaml=0.2.5=h7b6447c_0
zipp=3.4.1=pypi_0
zlib=1.2.11=h7b6447c_3
zstd=1.4.9=ha95c52a_0
  
  
Posted 3 years ago

ca-certificates           2021.1.19            h06a4308_1  
certifi                   2020.12.5        py38h06a4308_0  
cudatoolkit               11.0.221             h6bb024c_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses                   6.2                  he6710b0_1  
openssl                   1.1.1j               h27cfd23_0  
pip                       20.0.2                   py38_1    conda-forge
python                    3.8.8                hdb3f193_4  
readline                  8.1                  h27cfd23_0  
setuptools                52.0.0           py38h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
tk                        8.6.10               hbc83047_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
  
  
Posted 3 years ago

Hmm, you are correct
Which means this is some conda issue, basically when installing from env file, conda is not resolving the correct pytorch version 😞
Not sure why... Could you try to upgrade conda ?

  
  
Posted 3 years ago

Does clearml resolve the CUDA Version from driver or conda?

Actually it starts with the default CUDA based on the host driver, but when it installs the conda env it takes it from the "installed packages" (i.e. the one you used to execute the code in the first place)

Regrading link, I could not find the exact version bu this is close enough I guess:
None

  
  
Posted 3 years ago

Perfect, will try it. fyi: The conda_channels that I used are from clearml-agent init

  
  
Posted 3 years ago

But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me

  
  
Posted 3 years ago

Can you actually reproduce my problem when also using conda_freeze: true ?

  
  
Posted 3 years ago

Could you test with 4.7.5 ?

  
  
Posted 3 years ago

Would it help you diagnose this problem if I ran conda env create --file=environment.yml and see whether it works?

  
  
Posted 3 years ago

So it should have detected 11.2...

  
  
Posted 3 years ago

I installed my local conda environment from an environment.yml without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.

  
  
Posted 3 years ago

Great, thanks!

  
  
Posted 3 years ago

fyi: NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2

  
  
Posted 3 years ago

It asks the driver or find the cuda dll/so

  
  
Posted 3 years ago

I mean the version which it bases the PyTorch installation on.

  
  
Posted 3 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 3 years ago

Whats the conda version you are using ?

  
  
Posted 3 years ago

Sure, but I will try it tomorrow then.

  
  
Posted 3 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98>
Could you send the full log ? Also what's the clearml-agent version?

  
  
Posted 3 years ago

'conda --version'

  
  
Posted 3 years ago

My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?

  
  
Posted 3 years ago

I get 110 but it should be 111

  
  
Posted 3 years ago

Thanks!

  
  
Posted 3 years ago

So only short update for today: I did not yet start a run with conda 4.7.12.
But one question: Actually conda can not be at fault here, right? I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

  
  
Posted 3 years ago

Do you know how I can get this version?

  
  
Posted 3 years ago

No worries, gnight :)

  
  
Posted 3 years ago
18K Views
161 Answers
3 years ago
7 months ago
Tags