This is what the instance state looks like, as logged by clearml:
relately, I just noticed that the GPU is not starting. This was in the logs:2022-04-07 20:59:54.464854: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Do we need to call a specific instance w/ CUDA preinstalled or does clearml take care of it?
Hi CloudySwallow27 , regarding - Process terminated by user
- Are you running Hyperparam Optimization?
Regarding CUDA - yes, you need CUDA installed (or run it from a docker with CUDA) - ClearML doesn't handle the CUDA installation since this is on a driver level.
Thanks! I installed CUDA/CuDNN on the image and now the GPU is being utilized.