I guess not many tensorflowers running agents around here if this wasn't brought up already
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
By the way, just inspecting, the CUDA version on the output of nvidia-smi
is matching the driver installed on the host, and not the container - look at the image below
But I'm naive enough to believe that 10.2 is compatible with 10.1 as it is a minor upgrade
but remember, it didnt work also with the default one (nvidia/cuda)
I really don't know, as you can see in my last screenshot, I've configured my base image to be 10.1
replace the base-docker-image and it should work fine 🙂
https://hub.docker.com/layers/nvidia/cuda/10.1-cudnn7-runtime-ubuntu18.04/images/sha256-963696628c9a0d27e9e5c11c5a588698ea22eeaf138cc9bff5368c189ff79968?context=explore
the docker image is missing the cudnn which is a must for TF to work 🙂
and the machine I have is 10.2.
I also tried nvidia/cuda:10.2-base-ubuntu18.04 which is the latest
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
We might need to change the default base docker image, but I remember it was there... Let me check again
Thanks very much
Now something else is failing, but I'm pretty sure its on my side now... So have a good day and see you in the next question 😄
This is odd because the screen grab point to CUDA 10.2 ...
Hmmm could you attach the entire log?
Remove any info that you feel is too sensitive :)