Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


So it should have detected 11.2...

  
  
Posted 4 years ago

It is now looking for conflicts.

  
  
Posted 4 years ago

btw: why is agent.package_manager and agent attribute. Imo it does not make sense because conda can install pip packages, but pip cannot install conda packages which can lead to install failures, right?

  
  
Posted 4 years ago

The problem is that clearml installs cudatoolkit=11.0 but cudatoolkit=11.1 is needed. By setting agent.cuda_version=11.1 in clearml.conf it uses the correct version and installs fine. With version 11.0 conda will resolve conflicts by installing pytorch cpu-version.

  
  
Posted 4 years ago

Okay. And 110 means 11.1 and not 11.0?

  
  
Posted 4 years ago

I will try again tomorrow. It s getting late! Thank you for helping so far!

  
  
Posted 4 years ago

==> 2021-03-11 12:50:38 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
--
==> 2021-03-11 12:50:40 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.0 --quiet --json
--
==> 2021-03-11 12:50:43 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch pip<20.2 --quiet --json
--
==> 2021-03-11 12:51:17 <==
# cmd: /home/tim/miniconda3/bin/conda-env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envaz1ne897.yml --quiet --json
  
  
Posted 4 years ago

For now I can tell you that with conda_freeze: true it fails, but with conda_freeze: false it works!

  
  
Posted 4 years ago

What do you mean?

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 4 years ago

==> 2021-03-11 13:54:59 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
# conda version: 4.9.2
+defaults/linux-64::_libgcc_mutex-0.1-main
+defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+defaults/linux-64::certifi-2020.12.5-py38h06a4308_0
+defaults/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
+defaults/linux-64::libedit-3.1.20191231-h14c3975_1
+defaults/linux-64::libffi-3.3-he6710b0_2
+defaults/linux-64::libgcc-ng-9.1.0-hdf63c60_0
+defaults/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
+defaults/linux-64::ncurses-6.2-he6710b0_1
+defaults/linux-64::openssl-1.1.1j-h27cfd23_0
+defaults/linux-64::pip-21.0.1-py38h06a4308_0
+defaults/linux-64::python-3.8.8-hdb3f193_4
+defaults/linux-64::readline-8.1-h27cfd23_0
+defaults/linux-64::setuptools-52.0.0-py38h06a4308_0
+defaults/linux-64::sqlite-3.33.0-h62c20be_0
+defaults/linux-64::tk-8.6.10-hbc83047_0
+defaults/linux-64::xz-5.2.5-h7b6447c_0
+defaults/linux-64::zlib-1.2.11-h7b6447c_3
+defaults/noarch::wheel-0.36.2-pyhd3eb1b0_0
# update specs: ['python=3.8']
==> 2021-03-11 13:55:01 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.1 --quiet --json
# conda version: 4.9.2
-defaults/linux-64::_libgcc_mutex-0.1-main
-defaults/linux-64::libgcc-ng-9.1.0-hdf63c60_0
-defaults/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
+conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
+conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
+conda-forge/linux-64::cudatoolkit-11.1.1-h6406543_8
+conda-forge/linux-64::libgcc-ng-9.3.0-h2828fa1_18
+conda-forge/linux-64::libgomp-9.3.0-h2828fa1_18
+conda-forge/linux-64::libstdcxx-ng-9.3.0-h6de172a_18
# update specs: ['cudatoolkit=11.1']
==> 2021-03-11 13:55:06 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch pip<20.2 --quiet --json
# conda version: 4.9.2
-defaults/linux-64::pip-21.0.1-py38h06a4308_0
+conda-forge/linux-64::pip-20.0.2-py38_1
# update specs: ["pip[version='<20.2']"]
==> 2021-03-11 13:55:21 <==
# cmd: /home/tim/miniconda3/bin/conda-env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envu_x8060h.yml --quiet --json
# conda version: 4.9.2
-conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
-defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+conda-forge/linux-64::_openmp_mutex-4.5-1_llvm
+conda-forge/linux-64::ffmpeg-4.3.1-h03821db_2
+conda-forge/linux-64::gnutls-3.6.13-h85f3911_1
+conda-forge/linux-64::libblas-3.9.0-8_mkl
+conda-forge/linux-64::libiconv-1.16-h516909a_0
+conda-forge/linux-64::libprotobuf-3.15.5-h780b84a_0
+conda-forge/linux-64::libuv-1.41.0-h7f98852_0
+conda-forge/linux-64::llvm-openmp-11.0.1-h4bd325d_0
+conda-forge/linux-64::mkl-2020.4-h726a3e6_304
+conda-forge/linux-64::mkl_random-1.2.0-py38hc5bc63f_1
+conda-forge/linux-64::nettle-3.6-he412f7d_0
+conda-forge/linux-64::openh264-2.1.1-h780b84a_0
+conda-forge/linux-64::python_abi-3.8-1_cp38
+conda-forge/linux-64::pytorch-1.8.0-cpu_py38he614459_0
+conda-forge/linux-64::sleef-3.5.1-h7f98852_1
+conda-forge/linux-64::zstd-1.4.9-ha95c52a_0
+defaults/linux-64::blas-1.0-mkl
+defaults/linux-64::bzip2-1.0.8-h7b6447c_0
+defaults/linux-64::ca-certificates-2020.12.8-h06a4308_1
+defaults/linux-64::cffi-1.14.5-py38h261ae71_0
+defaults/linux-64::freetype-2.10.4-h5ab3b9f_0
+defaults/linux-64::future-0.18.2-py38_1
+defaults/linux-64::gmp-6.2.1-h2531618_2
+defaults/linux-64::jpeg-9b-h024ee3a_2
+defaults/linux-64::lame-3.100-h7b6447c_0
+defaults/linux-64::lcms2-2.11-h396b838_0
+defaults/linux-64::libpng-1.6.37-hbc83047_0
+defaults/linux-64::libtiff-4.1.0-h2733197_1
+defaults/linux-64::lz4-c-1.9.3-h2531618_0
+defaults/linux-64::mkl-service-2.3.0-py38he904b0f_0
+defaults/linux-64::mkl_fft-1.3.0-py38h54f3939_0
+defaults/linux-64::ninja-1.10.2-py38hff7bd54_0
+defaults/linux-64::numpy-1.19.2-py38h54aff64_0
+defaults/linux-64::numpy-base-1.19.2-py38hfa32c7d_0
+defaults/linux-64::pillow-8.1.2-py38he98fc37_0
+defaults/linux-64::six-1.15.0-py38h06a4308_0
+defaults/linux-64::x264-1!152.20180806-h7b6447c_0
+defaults/noarch::olefile-0.46-py_0
+defaults/noarch::pycparser-2.20-py_2
+defaults/noarch::typing_extensions-3.7.4.3-pyha847dfd_0
+pytorch/linux-64::torchaudio-0.8.0-py38
+pytorch/linux-64::torchvision-0.9.0-py38_cu111
# update specs: ['ld_impl_linux-64~=2.33.1', 'libpng~=1.6.37', 'ninja~=1.10.2', 'pytorch~=1.8.0', 'zstd~=1.4.9', 'gmp~=6.2.1', 'python~=3.8.8', 'torchaudio~=0.8.0', 'libtiff~=4.1.0', 'mkl-service~=2.3.0', 'typing_extensions~=3.7.4.3', 'llvm-openmp~=11.0.1', 'python_abi~=3.8', 'readline~=8.1', 'jpeg~=9b.0', 'libedit~=3.1.20191231', 'mkl_random~=1.2.0', 'numpy~=1.19.2', 'openssl~=1.1.1j', 'pillow~=8.1.2', 'blas~=1.0', 'setuptools~=52.0.0', 'tk~=8.6.10', 'ffmpeg~=4.3', 'lz4-c~=1.9.3', 'xz~=5.2.5', 'ncurses~=6.2', 'lame~=3.100', 'libgcc-ng~=9.3.0', 'libffi~=3.3', 'six~=1.15.0', 'certifi~=2020.12.5', 'libuv~=1.41.0', 'gnutls~=3.6.13', 'torchvision~=0.9.0', 'sqlite~=3.33.0', 'libstdcxx-ng~=9.3.0', 'olefile~=0.46', 'openh264~=2.1.1', 'libiconv~=1.16', 'ca-certificates~=2020.12.5', 'cudatoolkit~=11.1.1', 'mkl_fft~=1.3.0', 'freetype~=2.10.4', 'numpy-base~=1.19.2', 'wheel~=0.36.2', 'mkl~=2020.4', 'nettle~=3.6', 'lcms2~=2.11', 'bzip2~=1.0.8', 'zlib~=1.2.11']
  
  
Posted 4 years ago

So only short update for today: I did not yet start a run with conda 4.7.12.
But one question: Actually conda can not be at fault here, right? I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

  
  
Posted 4 years ago

Perfect! I have to thank you for helping me, not the other way around!

  
  
Posted 4 years ago

(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)

  
  
Posted 4 years ago

One more thing: The cuda_version that clearml finds automatically is wrong.

  
  
Posted 4 years ago

Thank you! 🙂

  
  
Posted 4 years ago

Hurray conda.
Notice it does include cudatoolkit , but conda ignores it

cudatoolkit~=11.1.1

Can you test the same one only serach and replace ~= with == ?

  
  
Posted 4 years ago

By host you mean the machine on which the agent is running? How does clearml-agent find the cuda_version?

  
  
Posted 4 years ago

channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit~=11.1.1
- pytorch~=1.8.0

Works fine

  
  
Posted 4 years ago

Or there should be an early error for trying to run conda based tasks on pip agents

  
  
Posted 4 years ago

Okay found it 🙂 it returns 11020 instead of 112

  
  
Posted 4 years ago

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

  
  
Posted 4 years ago

No problem! I profit so much from clearml 🙂

  
  
Posted 4 years ago

Complete conda log

  
  
Posted 4 years ago

name: core
channels:
  - pytorch
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.10.14
  - certifi=2020.6.20
  - cloudpickle=1.6.0
  - cudatoolkit=11.1.1
  - cycler=0.10.0
  - cytoolz=0.11.0
  - dask-core=2021.2.0
  - decorator=4.4.2
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - imageio=2.9.0
  - jpeg=9b
  - kiwisolver=1.3.1
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libgfortran-ng=7.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - matplotlib-base=3.3.4
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - networkx=2.5
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pip=21.0.1
  - pyparsing=2.4.7
  - python=3.7.10
  - python-dateutil=2.8.1
  - python_abi=3.7
  - pytorch=1.8.0
  - pywavelets=1.1.1
  - readline=8.1
  - scikit-image=0.17.2
  - scipy=1.6.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tifffile=2020.10.1
  - tk=8.6.10
  - toolz=0.11.1
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - tornado=6.1
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - yaml=0.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - aiostream==0.4.2
    - attrs==20.3.0
    - clearml==0.17.4
    - dm-control==0.0.355168290
    - dm-env==1.4
    - furl==2.1.0
    - future==0.18.2
    - glfw==2.1.0
    - gym==0.18.0
    - humanfriendly==9.1
    - imageio-ffmpeg==0.4.3
    - jsonschema==3.2.0
    - labmaze==1.0.3
    - lxml==4.6.2
    - moviepy==1.0.3
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - pillow==7.2.0
    - proglog==0.1.9
    - psutil==5.8.0
    - pybullet==3.0.9
    - pygame==2.0.1
    - pyglet==1.5.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - requests-file==1.5.1
    - tensorboard==2.4.1
    - tensorboardx==2.1
  
  
Posted 4 years ago

Thanks! Tomorrow is great, I'll put the wheel here 🙂

  
  
Posted 4 years ago

It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?

  
  
Posted 4 years ago

Perfect, will try it. fyi: The conda_channels that I used are from clearml-agent init

  
  
Posted 4 years ago

One question: Does clearml resolve the CUDA Version from driver or conda?

  
  
Posted 4 years ago

send me the conda freeze:

# Name                    Version                   Build  Channel
...
  
  
Posted 4 years ago
106K Views
161 Answers
4 years ago
one year ago
Tags