Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


And the one with the CPU version? is it with "~=" or "="?

  
  
Posted 4 years ago

Oh, the hacked one.

  
  
Posted 4 years ago

btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.

  
  
Posted 4 years ago

==> 2021-03-11 12:50:38 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
--
==> 2021-03-11 12:50:40 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.0 --quiet --json
--
==> 2021-03-11 12:50:43 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch pip<20.2 --quiet --json
--
==> 2021-03-11 12:51:17 <==
# cmd: /home/tim/miniconda3/bin/conda-env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envaz1ne897.yml --quiet --json
  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> it works on my machine 😞

  
  
Posted 4 years ago

@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:

nvidia-smi

And here:

ls -la /usr/local/
  
  
Posted 4 years ago

The problem is that clearml installs 

cudatoolkit=11.0

 but 

cudatoolkit=11.1

 is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.

Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem

  
  
Posted 4 years ago

Do you know how I can get this version?

  
  
Posted 4 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with

pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl

Then run the daemon with the additional --debug argument, basically:

clearml-agent --debug daemon --foreground ...

Once the agent is running please send the Task's log from your console 🙂

  
  
Posted 4 years ago

Okay found it 🙂 it returns 11020 instead of 112

  
  
Posted 4 years ago

Did not happen with conda 4.9.2

  
  
Posted 4 years ago

You suggested this fix earlier, but I am not sure why it didnt work then.

  
  
Posted 4 years ago

And then?

  
  
Posted 4 years ago

thanks!

  
  
Posted 4 years ago

Thu Mar 11 17:52:45 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56       Driver Version: 460.56       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:01:00.0 Off |                  N/A |
| 61%   63C    P2   296W / 350W |   8318MiB / 24268MiB |     74%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090    Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   29C    P8    20W / 350W |      1MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    133165    C+G   ...s-builds.1/3.7/bin/python     8314MiB |
+-----------------------------------------------------------------------------+
  
  
Posted 4 years ago

I mean the version which it bases the PyTorch installation on.

  
  
Posted 4 years ago

I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.

  
  
Posted 4 years ago

Let me check

  
  
Posted 4 years ago

channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit~=11.1.1
- pytorch~=1.8.0

Works fine

  
  
Posted 4 years ago

Yea, will do so in 30min

  
  
Posted 4 years ago

And this works fine.

  
  
Posted 4 years ago

Okay. And 110 means 11.1 and not 11.0?

  
  
Posted 4 years ago

Type "help", "copyright", "credits" or "license" for more information.
>>> from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
>>> get_driver_cuda_version()
'110'
  
  
Posted 4 years ago

sure

  
  
Posted 4 years ago

WTF?!

  
  
Posted 4 years ago

Hurray conda.
Notice it does include cudatoolkit , but conda ignores it

cudatoolkit~=11.1.1

Can you test the same one only serach and replace ~= with == ?

  
  
Posted 4 years ago

So only short update for today: I did not yet start a run with conda 4.7.12.
But one question: Actually conda can not be at fault here, right? I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

  
  
Posted 4 years ago

Upgrade back?

  
  
Posted 4 years ago

I try it one more time just to make sure

  
  
Posted 4 years ago

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

  
  
Posted 4 years ago
129K Views
161 Answers
4 years ago
one year ago
Tags