Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 3 years ago
Votes Newest

Answers 161


@<1523701868901961728:profile|ReassuredTiger98> in the UI can you see it in the "installed packages" section under the Execution Tab ?

  
  
Posted 3 years ago

Mhhm, now conda env creation takes forever since it probably resolves conflicts. At least that is what is happening when I tried to manually install my environment

  
  
Posted 3 years ago

And this works fine.

  
  
Posted 3 years ago

channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit~=11.1.1
- pytorch~=1.8.0

Works fine

  
  
Posted 3 years ago

Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked

/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
- jpeg~=9b.0
- kiwisolver~=1.3.1
- lame~=3.100
- lcms2~=2.11
- ld_impl_linux-64~=2.33.1
- libedit~=3.1.20191231
- libffi~=3.3
- libgcc-ng~=9.3.0
- libgfortran-ng~=7.3.0
- libiconv~=1.16
- libpng~=1.6.37
- libstdcxx-ng~=9.3.0
- libtiff~=4.1.0
- libuv~=1.41.0
- llvm-openmp~=11.0.1
- lz4-c~=1.9.3
- matplotlib-base~=3.3.4
- mkl~=2020.4
- mkl-service~=2.3.0
- mkl_fft~=1.3.0
- mkl_random~=1.2.0
- ncurses~=6.2
- nettle~=3.6
- networkx~=2.5
- ninja~=1.10.2
- numpy~=1.19.2
- numpy-base~=1.19.2
- olefile~=0.46
- openh264~=2.1.1
- openssl~=1.1.1j
- pyparsing~=2.4.7
- python~=3.7.10
- python-dateutil~=2.8.1
- python_abi~=3.7
- pytorch~=1.8.0
- pywavelets~=1.1.1
- pyyaml~=5.3.1
- readline~=8.1
- scikit-image~=0.17.2
- scipy~=1.6.1
- setuptools~=52.0.0
- six~=1.15.0
- sqlite~=3.33.0
- tifffile~=2020.10.1
- tk~=8.6.10
- toolz~=0.11.1
- torchaudio~=0.8.0
- torchvision~=0.9.0
- tornado~=6.1
- typing_extensions~=3.7.4.3
- wheel~=0.36.2
- xz~=5.2.5
- yaml~=0.2.5
- zlib~=1.2.11
- zstd~=1.4.9
  
  
Posted 3 years ago

Wtf? can you try with = (notice single not double)?

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0
  
  
Posted 3 years ago

The problem is that clearml installs cudatoolkit=11.0 but cudatoolkit=11.1 is needed. By setting agent.cuda_version=11.1 in clearml.conf it uses the correct version and installs fine. With version 11.0 conda will resolve conflicts by installing pytorch cpu-version.

  
  
Posted 3 years ago

I tried "~=", "==" and "="

  
  
Posted 3 years ago

Do you know how I can make sure I do not have CUDA or a broken installation installed?

  
  
Posted 3 years ago

Did not happen with conda 4.9.2

  
  
Posted 3 years ago

And how is

Summary - installed python packages: 
conda:
....

generated?

  
  
Posted 3 years ago

sure.

  
  
Posted 3 years ago

My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?

For some reason it detect CUDA 11.1 (I assume this is what you have installed, the driver CUDA version is the highest it will support not necessary what you have installed)

  
  
Posted 3 years ago

Let me check something

  
  
Posted 3 years ago

Okay.

  
  
Posted 3 years ago

@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None

  
  
Posted 3 years ago

Or there should be an early error for trying to run conda based tasks on pip agents

  
  
Posted 3 years ago

Same error.

  
  
Posted 3 years ago

drwxr-xr-x 10 root root 4096 Jul 31  2020 .
drwxr-xr-x 14 root root 4096 Jul 31  2020 ..
drwxr-xr-x  2 root root 4096 Feb  4 13:52 bin
drwxr-xr-x  2 root root 4096 Jul 31  2020 etc
drwxr-xr-x  2 root root 4096 Jul 31  2020 games
drwxr-xr-x  2 root root 4096 Jul 31  2020 include
drwxr-xr-x  4 root root 4096 Feb  3 13:40 lib
lrwxrwxrwx  1 root root    9 Dez 10 14:29 man -> share/man
drwxr-xr-x  2 root root 4096 Jul 31  2020 sbin
drwxr-xr-x  7 root root 4096 Jul 31  2020 share
drwxr-xr-x  2 root root 4096 Jul 31  2020 src
  
  
Posted 3 years ago

Still shows CPU version when I run conda list

  
  
Posted 3 years ago

Thu Mar 11 17:52:45 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56       Driver Version: 460.56       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:01:00.0 Off |                  N/A |
| 61%   63C    P2   296W / 350W |   8318MiB / 24268MiB |     74%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090    Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   29C    P8    20W / 350W |      1MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    133165    C+G   ...s-builds.1/3.7/bin/python     8314MiB |
+-----------------------------------------------------------------------------+
  
  
Posted 3 years ago

Complete conda log

  
  
Posted 3 years ago

btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.

  
  
Posted 3 years ago

Will do!

  
  
Posted 3 years ago

Ha?!

  
  
Posted 3 years ago

How does clearml-agent create the conda environment?

  
  
Posted 3 years ago

@<1523701868901961728:profile|ReassuredTiger98> thank you so much for testing it!

  
  
Posted 3 years ago

Yep, this install PyTorch CPU

  
  
Posted 3 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with

pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl

Then run the daemon with the additional --debug argument, basically:

clearml-agent --debug daemon --foreground ...

Once the agent is running please send the Task's log from your console 🙂

  
  
Posted 3 years ago

Thank you! 🙂

  
  
Posted 3 years ago
18K Views
161 Answers
3 years ago
7 months ago
Tags