Thanks for your reply.
Environment of host system:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 33% 52C P0 115W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:03:00.0 Off | N/A |
| 30% 49C P0 108W / 350W | 4MiB / 24576MiB | 5% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
I installed cuda-toolkit-12.3 following this url on fractional gpu container. After the setup, I ran
nsys profile nvidia-smi (or other cuda program)
It says:
Segmentation fault (core dumped)
Tell me if you need another information. Thanks again.
- Install the docker container which clearml provides (clearml/fractional-gpu:u22-cu12.3-4gb)
- Run
docker run -it --gpus all --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash
- Install the cuda toolkit
wget
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget
dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.0-545.23.06-1_amd64.deb
cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cuda-toolkit-12-3
- Run
nsys profile nvidia-smi
Addtionally, my container was clearml/fractional-gpu:u22-cu12.3-4gb (other containers based on cu12.3 also show that error too).
That "Segmentation fault (core dumped)' was all I got.
I ran nsys profiler inside of the container.
What command did you run? What were you trying to do? What was the setup?
Can you provide the log though? Where you got there error?