Okay found it 🙂 it returns 11020 instead of 112
Type "help", "copyright", "credits" or "license" for more information.
>>> from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
>>> get_driver_cuda_version()
'110'
I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.
drwxr-xr-x 10 root root 4096 Jul 31 2020 .
drwxr-xr-x 14 root root 4096 Jul 31 2020 ..
drwxr-xr-x 2 root root 4096 Feb 4 13:52 bin
drwxr-xr-x 2 root root 4096 Jul 31 2020 etc
drwxr-xr-x 2 root root 4096 Jul 31 2020 games
drwxr-xr-x 2 root root 4096 Jul 31 2020 include
drwxr-xr-x 4 root root 4096 Feb 3 13:40 lib
lrwxrwxrwx 1 root root 9 Dez 10 14:29 man -> share/man
drwxr-xr-x 2 root root 4096 Jul 31 2020 sbin
drwxr-xr-x 7 root root 4096 Jul 31 2020 share
drwxr-xr-x 2 root root 4096 Jul 31 2020 src
Thu Mar 11 17:52:45 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 61% 63C P2 296W / 350W | 8318MiB / 24268MiB | 74% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:21:00.0 Off | N/A |
| 30% 29C P8 20W / 350W | 1MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 133165 C+G ...s-builds.1/3.7/bin/python 8314MiB |
+-----------------------------------------------------------------------------+
@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:
nvidia-smi
And here:
ls -la /usr/local/
Or there should be an early error for trying to run conda based tasks on pip agents
btw: why is agent.package_manager
and agent attribute. Imo it does not make sense because conda can install pip packages, but pip cannot install conda packages which can lead to install failures, right?
One more thing: The cuda_version that clearml finds automatically is wrong.
Yes that is exactly what I will make sure we change :)
Perfect, will try it. fyi: The conda_channels that I used are from clearml-agent init
Well, in that case, just change the order it should solve it (I'll make sure we have that as the default:
conda_channels: ["pytorch", "conda-forge", "defaults", ]
It should solve the issue 🙂
conda_channels: ["defaults", "conda-forge", "pytorch", ]
@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None
Can you ping me when it is updated in None so I can update my installation?
No problem! I profit so much from clearml 🙂
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit~=11.1.1
- pytorch~=1.8.0
Works fine
@<1523701868901961728:profile|ReassuredTiger98> thank you so much for testing it!
Damn, okay I'll make sure we fix the order.
Could you verify the ~= works as intended (if the order id correct)
Just tested again. The ordering definitly matters.
okay, I'll make sure we order it correctly
I try it one more time just to make sure