But the process is still hanging, and not proceeding to actually running the clearml task
I suggest running it in docker mode with a docker image that already has cuda installed
OK, then just try the docker image I suggested 🙂
This has been resolved now! Thank you for your help @<1523701070390366208:profile|CostlyOstrich36>
I can install on the server with this command
to achieve running both the agent and the deployment on the same machine, adding --network=host to the run arguments solved it!
What I dont understand is how to tell clearml to install this version of pytorch and torchvision, with cu118
CUDA is the driver itself. The agent doesn't install CUDA but installs a compatible torch assuming that CUDA is properly installed.
It means that there is an issue with the drivers. I suggest trying this docker image - nvcr.io/nvidia/pytorch:23.04-py3
Hi @<1523701070390366208:profile|CostlyOstrich36> I am not specifying a version 🙂