Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


So it should have detected 11.2...

  
  
Posted 4 years ago

It is now looking for conflicts.

  
  
Posted 4 years ago

Just tested again. The ordering definitly matters.

  
  
Posted 4 years ago

Also tried conda version 4.7.12. Same problem.

  
  
Posted 4 years ago

btw: why is agent.package_manager and agent attribute. Imo it does not make sense because conda can install pip packages, but pip cannot install conda packages which can lead to install failures, right?

  
  
Posted 4 years ago

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit==11.1.1
- pytorch==1.8.0

Gives CPU version

  
  
Posted 4 years ago

But here is the funny thing:

channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0

Installs GPU

  
  
Posted 4 years ago

The problem is that clearml installs cudatoolkit=11.0 but cudatoolkit=11.1 is needed. By setting agent.cuda_version=11.1 in clearml.conf it uses the correct version and installs fine. With version 11.0 conda will resolve conflicts by installing pytorch cpu-version.

  
  
Posted 4 years ago

Okay. And 110 means 11.1 and not 11.0?

  
  
Posted 4 years ago

This my environment installed from env file. Training works just fine here:

  
  
Posted 4 years ago

Installed miniconda finally, now trying to run the task

  
  
Posted 4 years ago

Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed

  
  
Posted 4 years ago

Where again does clearml place the venv?

Usually ~/.clearml/venvs-builds/<python version>/
Multiple agents will be venvs-builds.1 and so on

  
  
Posted 4 years ago

I will try again tomorrow. It s getting late! Thank you for helping so far!

  
  
Posted 4 years ago

==> 2021-03-11 12:50:38 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
--
==> 2021-03-11 12:50:40 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.0 --quiet --json
--
==> 2021-03-11 12:50:43 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch pip<20.2 --quiet --json
--
==> 2021-03-11 12:51:17 <==
# cmd: /home/tim/miniconda3/bin/conda-env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envaz1ne897.yml --quiet --json
  
  
Posted 4 years ago

For now I can tell you that with conda_freeze: true it fails, but with conda_freeze: false it works!

  
  
Posted 4 years ago

Still shows CPU version when I run conda list

  
  
Posted 4 years ago

And the one with the CPU version? is it with "~=" or "="?

  
  
Posted 4 years ago

What's the difference between the two env files?

  
  
Posted 4 years ago

What do you mean?

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 4 years ago

==> 2021-03-11 13:54:59 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
# conda version: 4.9.2
+defaults/linux-64::_libgcc_mutex-0.1-main
+defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+defaults/linux-64::certifi-2020.12.5-py38h06a4308_0
+defaults/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
+defaults/linux-64::libedit-3.1.20191231-h14c3975_1
+defaults/linux-64::libffi-3.3-he6710b0_2
+defaults/linux-64::libgcc-ng-9.1.0-hdf63c60_0
+defaults/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
+defaults/linux-64::ncurses-6.2-he6710b0_1
+defaults/linux-64::openssl-1.1.1j-h27cfd23_0
+defaults/linux-64::pip-21.0.1-py38h06a4308_0
+defaults/linux-64::python-3.8.8-hdb3f193_4
+defaults/linux-64::readline-8.1-h27cfd23_0
+defaults/linux-64::setuptools-52.0.0-py38h06a4308_0
+defaults/linux-64::sqlite-3.33.0-h62c20be_0
+defaults/linux-64::tk-8.6.10-hbc83047_0
+defaults/linux-64::xz-5.2.5-h7b6447c_0
+defaults/linux-64::zlib-1.2.11-h7b6447c_3
+defaults/noarch::wheel-0.36.2-pyhd3eb1b0_0
# update specs: ['python=3.8']
==> 2021-03-11 13:55:01 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.1 --quiet --json
# conda version: 4.9.2
-defaults/linux-64::_libgcc_mutex-0.1-main
-defaults/linux-64::libgcc-ng-9.1.0-hdf63c60_0
-defaults/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
+conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
+conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
+conda-forge/linux-64::cudatoolkit-11.1.1-h6406543_8
+conda-forge/linux-64::libgcc-ng-9.3.0-h2828fa1_18
+conda-forge/linux-64::libgomp-9.3.0-h2828fa1_18
+conda-forge/linux-64::libstdcxx-ng-9.3.0-h6de172a_18
# update specs: ['cudatoolkit=11.1']
==> 2021-03-11 13:55:06 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch pip<20.2 --quiet --json
# conda version: 4.9.2
-defaults/linux-64::pip-21.0.1-py38h06a4308_0
+conda-forge/linux-64::pip-20.0.2-py38_1
# update specs: ["pip[version='<20.2']"]
==> 2021-03-11 13:55:21 <==
# cmd: /home/tim/miniconda3/bin/conda-env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envu_x8060h.yml --quiet --json
# conda version: 4.9.2
-conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
-defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+conda-forge/linux-64::_openmp_mutex-4.5-1_llvm
+conda-forge/linux-64::ffmpeg-4.3.1-h03821db_2
+conda-forge/linux-64::gnutls-3.6.13-h85f3911_1
+conda-forge/linux-64::libblas-3.9.0-8_mkl
+conda-forge/linux-64::libiconv-1.16-h516909a_0
+conda-forge/linux-64::libprotobuf-3.15.5-h780b84a_0
+conda-forge/linux-64::libuv-1.41.0-h7f98852_0
+conda-forge/linux-64::llvm-openmp-11.0.1-h4bd325d_0
+conda-forge/linux-64::mkl-2020.4-h726a3e6_304
+conda-forge/linux-64::mkl_random-1.2.0-py38hc5bc63f_1
+conda-forge/linux-64::nettle-3.6-he412f7d_0
+conda-forge/linux-64::openh264-2.1.1-h780b84a_0
+conda-forge/linux-64::python_abi-3.8-1_cp38
+conda-forge/linux-64::pytorch-1.8.0-cpu_py38he614459_0
+conda-forge/linux-64::sleef-3.5.1-h7f98852_1
+conda-forge/linux-64::zstd-1.4.9-ha95c52a_0
+defaults/linux-64::blas-1.0-mkl
+defaults/linux-64::bzip2-1.0.8-h7b6447c_0
+defaults/linux-64::ca-certificates-2020.12.8-h06a4308_1
+defaults/linux-64::cffi-1.14.5-py38h261ae71_0
+defaults/linux-64::freetype-2.10.4-h5ab3b9f_0
+defaults/linux-64::future-0.18.2-py38_1
+defaults/linux-64::gmp-6.2.1-h2531618_2
+defaults/linux-64::jpeg-9b-h024ee3a_2
+defaults/linux-64::lame-3.100-h7b6447c_0
+defaults/linux-64::lcms2-2.11-h396b838_0
+defaults/linux-64::libpng-1.6.37-hbc83047_0
+defaults/linux-64::libtiff-4.1.0-h2733197_1
+defaults/linux-64::lz4-c-1.9.3-h2531618_0
+defaults/linux-64::mkl-service-2.3.0-py38he904b0f_0
+defaults/linux-64::mkl_fft-1.3.0-py38h54f3939_0
+defaults/linux-64::ninja-1.10.2-py38hff7bd54_0
+defaults/linux-64::numpy-1.19.2-py38h54aff64_0
+defaults/linux-64::numpy-base-1.19.2-py38hfa32c7d_0
+defaults/linux-64::pillow-8.1.2-py38he98fc37_0
+defaults/linux-64::six-1.15.0-py38h06a4308_0
+defaults/linux-64::x264-1!152.20180806-h7b6447c_0
+defaults/noarch::olefile-0.46-py_0
+defaults/noarch::pycparser-2.20-py_2
+defaults/noarch::typing_extensions-3.7.4.3-pyha847dfd_0
+pytorch/linux-64::torchaudio-0.8.0-py38
+pytorch/linux-64::torchvision-0.9.0-py38_cu111
# update specs: ['ld_impl_linux-64~=2.33.1', 'libpng~=1.6.37', 'ninja~=1.10.2', 'pytorch~=1.8.0', 'zstd~=1.4.9', 'gmp~=6.2.1', 'python~=3.8.8', 'torchaudio~=0.8.0', 'libtiff~=4.1.0', 'mkl-service~=2.3.0', 'typing_extensions~=3.7.4.3', 'llvm-openmp~=11.0.1', 'python_abi~=3.8', 'readline~=8.1', 'jpeg~=9b.0', 'libedit~=3.1.20191231', 'mkl_random~=1.2.0', 'numpy~=1.19.2', 'openssl~=1.1.1j', 'pillow~=8.1.2', 'blas~=1.0', 'setuptools~=52.0.0', 'tk~=8.6.10', 'ffmpeg~=4.3', 'lz4-c~=1.9.3', 'xz~=5.2.5', 'ncurses~=6.2', 'lame~=3.100', 'libgcc-ng~=9.3.0', 'libffi~=3.3', 'six~=1.15.0', 'certifi~=2020.12.5', 'libuv~=1.41.0', 'gnutls~=3.6.13', 'torchvision~=0.9.0', 'sqlite~=3.33.0', 'libstdcxx-ng~=9.3.0', 'olefile~=0.46', 'openh264~=2.1.1', 'libiconv~=1.16', 'ca-certificates~=2020.12.5', 'cudatoolkit~=11.1.1', 'mkl_fft~=1.3.0', 'freetype~=2.10.4', 'numpy-base~=1.19.2', 'wheel~=0.36.2', 'mkl~=2020.4', 'nettle~=3.6', 'lcms2~=2.11', 'bzip2~=1.0.8', 'zlib~=1.2.11']
  
  
Posted 4 years ago

So only short update for today: I did not yet start a run with conda 4.7.12.
But one question: Actually conda can not be at fault here, right? I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

  
  
Posted 4 years ago

Perfect! I have to thank you for helping me, not the other way around!

  
  
Posted 4 years ago

Now I get:

ollecting package metadata (repodata.json): done
Solving environment: - 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                         
                                                                                                                                                                               
UnsatisfiableError: The following specifications were found to be incompatible with a past                                                                                     
explicit spec that is not an explicit spec in this operation (cudatoolkit):

  - pytorch==1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']

The following specifications were found to be incompatible with each other:



Package cudatoolkit conflicts for:
cudatoolkit=11.0
  
  
Posted 4 years ago

(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)

  
  
Posted 4 years ago

One more thing: The cuda_version that clearml finds automatically is wrong.

  
  
Posted 4 years ago

This is the venc

  
  
Posted 4 years ago

I installed my local conda environment from an environment.yml without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.

  
  
Posted 4 years ago

Thank you! 🙂

  
  
Posted 4 years ago
106K Views
161 Answers
4 years ago
one year ago
Tags