docker="nvidia/cuda:11.8.0-base-ubuntu20.04"
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.0.2
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'pip'. No files were found to uninstall.
ERROR: This container was built for NVIDIA Driver Release 530.30 or later, but
version 460.32.03 was detected and compatibility mode is UNAVAILABLE.
[[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
I think it tries to get the latest one. Are you using the agent in docker mode? you can also control this via clearml.conf with agent.cuda_version
I am running the agent with clearml-agent daemon --queue training
Solved that by setting docker_args=["--privileged", "--network=host"]
But the process is still hanging, and not proceeding to actually running the clearml task
Hi @<1734020162731905024:profile|RattyBluewhale45> , what version of pytorch are you specifying?
Hi @<1523701070390366208:profile|CostlyOstrich36> I am not specifying a version 🙂