Hi CostlyOstrich36 , thank you for answering so quick. I think that s not how it works because if this was true, one would have to always match local machine to servers. Afaik clearml finds the correct PyTorch Version, but I was not sure how (custom vs pip does it)
I am wondering cause when used in docker mode, the docker container may have a CUDA Version that is different from the host version. However, ClearML seems to use the host version instead of the docker container's version, which is a problem sometimes.
I used the wrong docker container. The docker container I used had version 11.4. Interestingly, the override from clearml.conf and CUDA_VERSION Env variable did not work there.
With the correct docker container everything works fine. Shame on me.
I have to correct myself, I do not even have CUDA installed. Only the driver and everything CUDA-related is provided by the docker container. This works with a container that has CUDA 11.4, but now I have one with 11.6 (latest nvidia pytorch docker).
However, even after changing the clearml.conf and overriding with CUDA_VERSION, the clearml-agent prints on the docker container
agent.cuda_version = 114 ! (Other changes to the clearml.conf on the agent are reflected in the docker, so only the CUDA version has an issue).