That gave me
Running in Docker mode (v19.03 and above) - using default docker image: nvidia/cuda running python3
Building Task 94jfk2479851047c18f1fa60c1364b871 inside docker: ubuntu:18.04
Starting docker build
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
Isn't it overkill to run a whole ubuntu 18.04 just to run a dead simple controller task?
In theory yes, in practice you will be using the same docker image for all the services, and they will never interfere with one another. and you have the option to do more sophisticated stuff, like map the file-server data for a clean up service (should be out in a few days :)) so a balance. Also remember that relatively speaking docker are quite light weight, this is not like saying a VM per service...
Does what you suggested here > https://github.com/allegroai/trains-agent/issues/18#issuecomment-634551232 also applies for containers used by the services queue?
trains-agent build --docker nvidia/cuda --id myTaskId --target base_env_services
It's building a gpu enabled docker...
you might want a diff container or to specific --cpu-only
Alright, thanks for the answer! Seems legit then 🙂
My bad, alpine is so light it doesnt have bash
JitteryCoyote63 What did you have in mind?
Alright, so the steps would be:
trains-agent build --docker nvidia/cuda --id myTaskId --target base_env_services
That would create me a base docker image base_env_services
. Then how should I ensure that trains-agent uses that base image for the services queue? My guess is:
trains-agent daemon --services-mode --detached --queue services --create-queue --docker base_env_services --cpu-only
Would that work?
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
Thanks! Corrected both, now its building
Does what you suggested here >
Yes, it is basically the same underlying mechanism, only instead of 1-to-1 it's 1-to-many
It will automatically switch to docker mode
btw, I tried with alpine instead of ubuntu:18.04, got :
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
df20fa9351a1: Pulling fs layer
df20fa9351a1: Verifying Checksum
df20fa9351a1: Download complete
df20fa9351a1: Pull complete
Digest: sha256:185518070891758909c9f839cf4ca393ee977ac378609f700f60a771a2dfe321
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: "bash": executable file not found in $PATH": unknown.
time="2020-06-09T13:24:55Z" level=error msg="error waiting for container: context canceled"
DONE: Running task 'ae788658786043ef8d3c758815a43eb8', exit status 127
Any idea?
ubuntu18.04 is actually 64Mo, I can live with that 😛
Yes that would work 🙂
You can also put it in the docker compose see TRAINS_AGENT_DEFAULT_BASE_DOCKER