Unanswered
Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr
@<1523701205467926528:profile|AgitatedDove14> I'm trying to run Clearml GPU compute(RTX 3080) with pytorch-lightning but keep getting CUDA error. Is there any specific CUDA/Ubuntu/torch/python version required? I tried several different version but can't make it work
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 as telos_algorithms
File "/code/.venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1013, in _run_stage
with isolate_rng():
File "/.pyenv/versions/3.10.9/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/code/.venv/lib/python3.10/site-packages/lightning/pytorch/utilities/seed.py", line 42, in isolate_rng
states = _collect_rng_states(include_cuda)
File "/code/.venv/lib/python3.10/site-packages/lightning/fabric/utilities/seed.py", line 115, in _collect_rng_states
states["torch.cuda"] = torch.cuda.get_rng_state_all()
File "/code/.venv/lib/python3.10/site-packages/torch/cuda/random.py", line 39, in get_rng_state_all
results.append(get_rng_state(i))
File "/code/.venv/lib/python3.10/site-packages/torch/cuda/random.py", line 22, in get_rng_state
_lazy_init()
File "/code/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
148 Views
0
Answers
one year ago
one year ago