Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


btw: why is agent.package_manager and agent attribute. Imo it does not make sense because conda can install pip packages, but pip cannot install conda packages which can lead to install failures, right?

  
  
Posted 4 years ago

I tried to run the task with detect_with_conda_freeze: false instead of true and got

Executing Conda: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch 'pip<20.2' --quiet --json
Pass
Conda: Trying to install requirements:
['pytorch~=1.8.0']
Executing Conda: /home/tim/miniconda3/condabin/conda env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envh7rq4qmc.yml --quiet --json
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):

  - pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']

The following specifications were found to be incompatible with each other:



Package cudatoolkit conflicts for:
cudatoolkit=11.0
Conda: Installing requirements: step 2 - using pip:
['clearml==0.17.4', 'tensorboard==2.4.1', 'pytorch~=1.8.0']
Collecting tensorboard==2.4.1
  Using cached tensorboard-2.4.1-py3-none-any.whl (10.6 MB)
ERROR: Could not find a version that satisfies the requirement pytorch~=1.8.0 (from -r /tmp/cached-reqsubuv0zrf.txt (line 3)) (from versions: 0.1.2, 1.0.2)
ERROR: No matching distribution found for pytorch~=1.8.0 (from -r /tmp/cached-reqsubuv0zrf.txt (line 3))
Command 'source /home/tim/miniconda3/etc/profile.d/conda.sh && conda activate /home/tim/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqsubuv0zrf.txt' returned non-zero exit status 1.
  
  
Posted 4 years ago

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

  
  
Posted 4 years ago

okay, I'll make sure we order it correctly

  
  
Posted 4 years ago

Okay. And 

110

 means 11.1 and not 11.0? (edited)

110 means 11.0, the odd thing is, it actually installed 11.1, and from the pytorch website this is exactly how they suggest to install with conda...
Let me know if forcing the CUDA version changes anything

  
  
Posted 4 years ago

By host you mean the machine on which the agent is running? How does clearml-agent find the cuda_version?

  
  
Posted 4 years ago

Okay this seems correct:

pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0

I can't seem to find what's the diff between the two.
Give me a second let me check if I can reproduce it somehow.

  
  
Posted 4 years ago

I try it one more time just to make sure

  
  
Posted 4 years ago

Thank you! 🙂

  
  
Posted 4 years ago

I just wanna add: I can run this task on the same workstation with the same conda installation just fine.

  
  
Posted 4 years ago

# Python 3.7.10 (default, Feb 26 2021, 18:47:35)  [GCC 7.3.0]

aiostream==0.4.2
attrs==20.3.0
clearml==0.17.4
dm-control==0.0.355168290
dm-env==1.4
furl==2.1.0
future==0.18.2
glfw==2.1.0
gym==0.18.0
humanfriendly==9.1
imageio-ffmpeg==0.4.3
jsonschema==3.2.0
labmaze==1.0.3
lxml==4.6.2
moviepy==1.0.3
orderedmultidict==1.0.1
pathlib2==2.3.5
pillow==7.2.0
proglog==0.1.9
psutil==5.8.0
pybullet==3.0.9
pygame==2.0.1
pyglet==1.5.0
pyjwt==2.0.1
pyrsistent==0.17.3
requests-file==1.5.1
tensorboard==2.4.1
tensorboardx==2.1

# Conda Packages

blas==1.0
bzip2==1.0.8
ca-certificates==2020.10.14
certifi==2020.6.20
cloudpickle==1.6.0
cudatoolkit==11.1.1
cycler==0.10.0
cytoolz==0.11.0
dask-core==2021.2.0
decorator==4.4.2
ffmpeg==4.3
freetype==2.10.4
gmp==6.2.1
gnutls==3.6.13
imageio==2.9.0
jpeg==9b
kiwisolver==1.3.1
lame==3.100
lcms2==2.11
ld_impl_linux-64==2.33.1
libedit==3.1.20191231
libffi==3.3
libgcc-ng==9.3.0
libgfortran-ng==7.3.0
libiconv==1.16
libpng==1.6.37
libstdcxx-ng==9.3.0
libtiff==4.1.0
libuv==1.41.0
llvm-openmp==11.0.1
lz4-c==1.9.3
matplotlib-base==3.3.4
mkl==2020.4
mkl-service==2.3.0
mkl_fft==1.3.0
mkl_random==1.2.0
ncurses==6.2
nettle==3.6
networkx==2.5
ninja==1.10.2
numpy==1.19.2
numpy-base==1.19.2
olefile==0.46
openh264==2.1.1
openssl==1.1.1j
pip==21.0.1
pyparsing==2.4.7
python==3.7.10
python-dateutil==2.8.1
python_abi==3.7
torch==1.8.0
pywavelets==1.1.1
pyyaml==5.3.1
readline==8.1
scikit-image==0.17.2
scipy==1.6.1
setuptools==52.0.0
six==1.15.0
sqlite==3.33.0
tifffile==2020.10.1
tk==8.6.10
toolz==0.11.1
torchaudio==0.8.0
torchvision==0.9.0
tornado==6.1
typing_extensions==3.7.4.3
wheel==0.36.2
xz==5.2.5
yaml==0.2.5
zlib==1.2.11
zstd==1.4.9
  
  
Posted 4 years ago

But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me

From the log it installed:
cudatoolkit==11.1.1
based on the CUDA it found on the host machine: agent.cuda_version = 110
But for some reason it installed the pytorch from the conda "pytorch" repo without the cuda support.

  
  
Posted 4 years ago

But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me

  
  
Posted 4 years ago

fyi: NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2

  
  
Posted 4 years ago

Okay. And 110 means 11.1 and not 11.0?

  
  
Posted 4 years ago

Yea, give me a minute.

  
  
Posted 4 years ago

sure

  
  
Posted 4 years ago

Hmm, you are correct
Which means this is some conda issue, basically when installing from env file, conda is not resolving the correct pytorch version 😞
Not sure why... Could you try to upgrade conda ?

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> in the UI can you see it in the "installed packages" section under the Execution Tab ?

  
  
Posted 4 years ago

Would it help you diagnose this problem if I ran conda env create --file=environment.yml and see whether it works?

  
  
Posted 4 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98>
Could you send the full log ? Also what's the clearml-agent version?

  
  
Posted 4 years ago

(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)

  
  
Posted 4 years ago

What's the difference between the two env files?

  
  
Posted 4 years ago

Installs CPU

  
  
Posted 4 years ago

Ha?!

  
  
Posted 4 years ago

Hurray conda.
Notice it does include cudatoolkit , but conda ignores it

cudatoolkit~=11.1.1

Can you test the same one only serach and replace ~= with == ?

  
  
Posted 4 years ago

And how is

Summary - installed python packages: 
conda:
....

generated?

  
  
Posted 4 years ago

Just tried: also works with 0.17.2

  
  
Posted 4 years ago

Yep, this install PyTorch CPU

  
  
Posted 4 years ago

Oh, the hacked one.

  
  
Posted 4 years ago
121K Views
161 Answers
4 years ago
one year ago
Tags