Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


One more thing: The cuda_version that clearml finds automatically is wrong.

  
  
Posted 4 years ago

Is ther a way to see the contents of /tmp/conda_envaz1ne897.yml ? Seems to be deleted after the task is finihsed

  
  
Posted 4 years ago

I installed my local conda environment from an environment.yml without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.

  
  
Posted 4 years ago

Where again does clearml place the venv?

Usually ~/.clearml/venvs-builds/<python version>/
Multiple agents will be venvs-builds.1 and so on

  
  
Posted 4 years ago

thanks!

  
  
Posted 4 years ago

Damn, okay I'll make sure we fix the order.
Could you verify the ~= works as intended (if the order id correct)

  
  
Posted 4 years ago

Oh, the hacked one.

  
  
Posted 4 years ago

Wtf? can you try with = (notice single not double)?

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0
  
  
Posted 4 years ago

Okay this is very close to what the agent is building:
Could you start a new conda env,
then install cudatoolkit=11.1
then run:

conda env update -p <conda_env_path_here> --file the_env_yaml.yml
  
  
Posted 4 years ago

Like this?

  
  
Posted 4 years ago

Nvm, I took a look at conda history and there I see it

  
  
Posted 4 years ago

'conda --version'

  
  
Posted 4 years ago

Type "help", "copyright", "credits" or "license" for more information.
>>> from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
>>> get_driver_cuda_version()
'110'
  
  
Posted 4 years ago

From the logs when ran with --foreground I I do not see any conda create command.

  
  
Posted 4 years ago

name: core
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - blas=1.0
  - bzip2=1.0.8
  - ca-certificates=2020.12.5
  - certifi=2020.12.5
  - cudatoolkit=11.1.1
  - ffmpeg=4.3
  - freetype=2.10.4
  - gmp=6.2.1
  - gnutls=3.6.13
  - jpeg=9b
  - lame=3.100
  - lcms2=2.11
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.3.0
  - libiconv=1.16
  - libpng=1.6.37
  - libstdcxx-ng=9.3.0
  - libtiff=4.1.0
  - libuv=1.41.0
  - llvm-openmp=11.0.1
  - lz4-c=1.9.3
  - mkl=2020.4
  - mkl-service=2.3.0
  - mkl_fft=1.3.0
  - mkl_random=1.2.0
  - ncurses=6.2
  - nettle=3.6
  - ninja=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - olefile=0.46
  - openh264=2.1.1
  - openssl=1.1.1j
  - pillow=8.1.2
  - pip=21.0.1
  - python=3.8.8
  - python_abi=3.8
  - pytorch=1.8.0
  - readline=8.1
  - setuptools=52.0.0
  - six=1.15.0
  - sqlite=3.33.0
  - tk=8.6.10
  - torchaudio=0.8.0
  - torchvision=0.9.0
  - typing_extensions=3.7.4.3
  - wheel=0.36.2
  - xz=5.2.5
  - zlib=1.2.11
  - zstd=1.4.9
  - pip:
    - attrs==20.3.0
    - clearml==0.17.4
    - furl==2.1.0
    - humanfriendly==9.1
    - jsonschema==3.2.0
    - orderedmultidict==1.0.1
    - pathlib2==2.3.5
    - psutil==5.8.0
    - pyjwt==2.0.1
    - pyrsistent==0.17.3
    - pyyaml==5.4.1
    - requests-file==1.5.1
  
  
Posted 4 years ago

Yea, give me a minute.

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> it works on my machine 😞

  
  
Posted 4 years ago

I will try a minimal version now

  
  
Posted 4 years ago

==> 2021-03-11 12:50:38 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
--
==> 2021-03-11 12:50:40 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.0 --quiet --json
--
==> 2021-03-11 12:50:43 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch pip<20.2 --quiet --json
--
==> 2021-03-11 12:51:17 <==
# cmd: /home/tim/miniconda3/bin/conda-env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envaz1ne897.yml --quiet --json
  
  
Posted 4 years ago

Installed miniconda finally, now trying to run the task

  
  
Posted 4 years ago

Can you ping me when it is updated in None so I can update my installation?

  
  
Posted 4 years ago

Same error 😕

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 4 years ago

Installs CPU

  
  
Posted 4 years ago

Maybe the ~= is breaking the conda "magic" version resolver

  
  
Posted 4 years ago

sure.

  
  
Posted 4 years ago

Thu Mar 11 17:52:45 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56       Driver Version: 460.56       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:01:00.0 Off |                  N/A |
| 61%   63C    P2   296W / 350W |   8318MiB / 24268MiB |     74%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090    Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   29C    P8    20W / 350W |      1MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    133165    C+G   ...s-builds.1/3.7/bin/python     8314MiB |
+-----------------------------------------------------------------------------+
  
  
Posted 4 years ago

Okay this seems correct:

pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0

I can't seem to find what's the diff between the two.
Give me a second let me check if I can reproduce it somehow.

  
  
Posted 4 years ago

Just tried: also works with 0.17.2

Great!

  
  
Posted 4 years ago

Just tried: also works with 0.17.2

  
  
Posted 4 years ago
127K Views
161 Answers
4 years ago
one year ago
Tags