Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have A Set Up An Agent, On A Gpu Machine, And Spun Up The Daemon In Docker Moder, And Specifically Specified A Gpu That It Will Work With. The Image Is Okay And I Verified That By Running

I have a set up an agent, on a GPU machine, and spun up the daemon in docker moder, and specifically specified a GPU that it will work with. The image is okay and I verified that by running docker run ... nvidia-smi and made sure it does have a valid output.

When I launch a task from an environment without GPU and use the task.execute_remotely to execute it on the agent, the agent pulls the task but the code does not run on the GPU (TensorFlow code). Could this be because the TF installed on the environment executing the line task.execute_remotely is the CPU version and therefor to replicate trains installs the same CPU variant instead of GPU? Or something else could be wrong here?

  
  
Posted 4 years ago
Votes Newest

Answers 32


image

  
  
Posted 4 years ago

nvidia/cuda:10.1-base-ubuntu18.04

  
  
Posted 4 years ago

(I changed it in the settings)

  
  
Posted 4 years ago

This is odd because the screen grab point to CUDA 10.2 ...

  
  
Posted 4 years ago

and the machine I have is 10.2.

I also tried nvidia/cuda:10.2-base-ubuntu18.04 which is the latest

  
  
Posted 4 years ago

that is because my own machine has 10.2 (not the docker, the machine the agent is on)

  
  
Posted 4 years ago

image

  
  
Posted 4 years ago

that is because my own machine has 10.2 (not the docker, the machine the agent is on)

No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?

  
  
Posted 4 years ago

I really don't know, as you can see in my last screenshot, I've configured my base image to be 10.1

  
  
Posted 4 years ago

Hmmm could you attach the entire log?
Remove any info that you feel is too sensitive :)

  
  
Posted 4 years ago

On it

  
  
Posted 4 years ago

Here you go

  
  
Posted 4 years ago

By the way, just inspecting, the CUDA version on the output of nvidia-smi is matching the driver installed on the host, and not the container - look at the image below

  
  
Posted 4 years ago

But I'm naive enough to believe that 10.2 is compatible with 10.1 as it is a minor upgrade

  
  
Posted 4 years ago

LOL

  
  
Posted 4 years ago

😅

  
  
Posted 4 years ago

O_O

  
  
Posted 4 years ago

replace the base-docker-image and it should work fine 🙂

  
  
Posted 4 years ago

Let's try

  
  
Posted 4 years ago

but remember, it didnt work also with the default one (nvidia/cuda)

  
  
Posted 4 years ago

We might need to change the default base docker image, but I remember it was there... Let me check again

  
  
Posted 4 years ago

🤞

  
  
Posted 4 years ago

well cudnn is actually missing from the base image...

  
  
Posted 4 years ago

Lets see if this is really the issue

  
  
Posted 4 years ago

😄

  
  
Posted 4 years ago

It works!

  
  
Posted 4 years ago

That was the issue then

  
  
Posted 4 years ago

I guess not many tensorflowers running agents around here if this wasn't brought up already

  
  
Posted 4 years ago

glad I managed to help back in some way

  
  
Posted 4 years ago
21K Views
32 Answers
4 years ago
7 months ago
Tags
Similar posts