Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


And this works fine.

  
  
Posted 4 years ago

But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me

  
  
Posted 4 years ago

I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.

  
  
Posted 4 years ago

Can you actually reproduce my problem when also using conda_freeze: true ?

  
  
Posted 4 years ago

I get 110 but it should be 111

  
  
Posted 4 years ago

Yes, that is what I pasted here.

  
  
Posted 4 years ago

Sure, I ll try this

  
  
Posted 4 years ago

My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?

For some reason it detect CUDA 11.1 (I assume this is what you have installed, the driver CUDA version is the highest it will support not necessary what you have installed)

  
  
Posted 4 years ago

Let me check something

  
  
Posted 4 years ago

ca-certificates           2021.1.19            h06a4308_1  
certifi                   2020.12.5        py38h06a4308_0  
cudatoolkit               11.0.221             h6bb024c_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses                   6.2                  he6710b0_1  
openssl                   1.1.1j               h27cfd23_0  
pip                       20.0.2                   py38_1    conda-forge
python                    3.8.8                hdb3f193_4  
readline                  8.1                  h27cfd23_0  
setuptools                52.0.0           py38h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
tk                        8.6.10               hbc83047_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
  
  
Posted 4 years ago

Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed

  
  
Posted 4 years ago

Yes I think the difference is running conda install with arguments vs conda install with env file...

  
  
Posted 4 years ago

name: core
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.12.5
  - certifi=2020.12.5
  - cudatoolkit=11.1.1
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - jpeg=9b
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pillow=8.1.2
  - pip=21.0.1
  - python=3.8.8
  - python_abi=3.8
  - pytorch=1.8.0
  - readline=8.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tk=8.6.10
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - attrs==20.3.0
    - clearml==0.17.4
    - furl==2.1.0
    - humanfriendly==9.1
    - jsonschema==3.2.0
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - psutil==5.8.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - pyyaml==5.4.1
    - requests-file==1.5.1
  
  
Posted 4 years ago

It asks the driver or find the cuda dll/so

  
  
Posted 4 years ago

I just started a task from this environment and it fails on the agent.

  
  
Posted 4 years ago

Okay found it 🙂 it returns 11020 instead of 112

  
  
Posted 4 years ago

Or there should be an early error for trying to run conda based tasks on pip agents

  
  
Posted 4 years ago

What do you mean?

  
  
Posted 4 years ago

No problem! I profit so much from clearml 🙂

  
  
Posted 4 years ago

One question: Does clearml resolve the CUDA Version from driver or conda?

  
  
Posted 4 years ago

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

  
  
Posted 4 years ago

Thanks! Tomorrow is great, I'll put the wheel here 🙂

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 4 years ago

It is now looking for conflicts.

  
  
Posted 4 years ago

Okay. And 110 means 11.1 and not 11.0?

  
  
Posted 4 years ago

Thank you! 🙂

  
  
Posted 4 years ago

send me the conda freeze:

# Name                    Version                   Build  Channel
...
  
  
Posted 4 years ago

It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?

  
  
Posted 4 years ago

So only short update for today: I did not yet start a run with conda 4.7.12.
But one question: Actually conda can not be at fault here, right? I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

  
  
Posted 4 years ago

btw: why is agent.package_manager and agent attribute. Imo it does not make sense because conda can install pip packages, but pip cannot install conda packages which can lead to install failures, right?

  
  
Posted 4 years ago
106K Views
161 Answers
4 years ago
one year ago
Tags