Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Since Today I Get

Hello!
Since today I get AssertionError: Torch not compiled with CUDA enabled for PyTorch 1.8.
Tasks that I submitted yesterday to the queue are also not working, even though they ran yesterday. PyTorch 1.7 based tasks work fine. Any idea what I could have done wrong?

  
  
Posted 4 years ago
Votes Newest

Answers 161


It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?

  
  
Posted 4 years ago

drwxr-xr-x 10 root root 4096 Jul 31  2020 .
drwxr-xr-x 14 root root 4096 Jul 31  2020 ..
drwxr-xr-x  2 root root 4096 Feb  4 13:52 bin
drwxr-xr-x  2 root root 4096 Jul 31  2020 etc
drwxr-xr-x  2 root root 4096 Jul 31  2020 games
drwxr-xr-x  2 root root 4096 Jul 31  2020 include
drwxr-xr-x  4 root root 4096 Feb  3 13:40 lib
lrwxrwxrwx  1 root root    9 Dez 10 14:29 man -> share/man
drwxr-xr-x  2 root root 4096 Jul 31  2020 sbin
drwxr-xr-x  7 root root 4096 Jul 31  2020 share
drwxr-xr-x  2 root root 4096 Jul 31  2020 src
  
  
Posted 4 years ago

(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)

  
  
Posted 4 years ago

Do you know how I can make sure I do not have CUDA or a broken installation installed?

  
  
Posted 4 years ago

btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.

  
  
Posted 4 years ago

Give me a minute

  
  
Posted 4 years ago

conda env update -p .clearml/venvs-builds/3.8 ./environment.yml

with environment.yml

name: clearml
channels:
  - pytorch
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - pytorch==1.8.0
  
  
Posted 4 years ago

Ha?!

  
  
Posted 4 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98>
Could you send the full log ? Also what's the clearml-agent version?

  
  
Posted 4 years ago

Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed

  
  
Posted 4 years ago

Could you send the end file?

  
  
Posted 4 years ago

I just started a task from this environment and it fails on the agent.

  
  
Posted 4 years ago

fyi: NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2

  
  
Posted 4 years ago

What's the difference between the two env files?

  
  
Posted 4 years ago

Hi @<1523701868901961728:profile|ReassuredTiger98>
This should have worked, and seems like conda is not fetching the correct pytorch version (even though the conda env contains the cuda version they specify)
Let's try something, reset the Task, then edit the "Installed packages" and add:

cudatoolkit==11.1.1

Then try again.
Let's see what we get.
(The idea, is that I think conda forgets it just install cudatoolkit and assumes the env is for CPU)

  
  
Posted 4 years ago

Could you try to do:

CUDA_VERSION="11.1" clearml-agent ...
  
  
Posted 4 years ago

Could you test with 4.7.5 ?

  
  
Posted 4 years ago

Version 0.17.2 it says

  
  
Posted 4 years ago

Will do!

  
  
Posted 4 years ago

Mhhm, now conda env creation takes forever since it probably resolves conflicts. At least that is what is happening when I tried to manually install my environment

  
  
Posted 4 years ago

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

  
  
Posted 4 years ago

channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit==11.1.1
- pytorch==1.8.0

Gives CPU version

  
  
Posted 4 years ago

I just wanna add: I can run this task on the same workstation with the same conda installation just fine.

  
  
Posted 4 years ago

end file?

  
  
Posted 4 years ago

The task already contains this

  
  
Posted 4 years ago

Would it help you diagnose this problem if I ran conda env create --file=environment.yml and see whether it works?

  
  
Posted 4 years ago

No worries, gnight :)

  
  
Posted 4 years ago

I get 110 but it should be 111

  
  
Posted 4 years ago

Complete conda log

  
  
Posted 4 years ago

So I just updated the env that clearml-agent created (and where pytorch cpu is installed) with my local environment.yml and now the correct version is installed, so most probably the `/tmp/conda_envaz1ne897.yml`` is the problem here

  
  
Posted 4 years ago
127K Views
161 Answers
4 years ago
one year ago
Tags