RattySeagull0

3 Questions, 21 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

21 × Eureka!

Questions 3
Answers 21

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi Everyone, I'M Trying To Execute Trains-Agent In Docker Mode With Conda As Package Manager, Is It Supported? I Tried To Work With Nvidia/Cuda:10.0-Runtime-Ubuntu18.04 And Got The Error "Trains_Agent: Error: Error: Package Manager "Conda" Selected, But '

Hi everyone, I'm trying to execute trains-agent in docker mode with conda as package manager, is it supported? I tried to work with nvidia/cuda:10.0-runtime-...

clearml

4 years ago

0 Votes

17 Answers

1K Views

0 Votes 17 Answers 1K Views

Hi Everyone, I Am Trying To Use Docker Mode For Trains-Agent, But It Seems That It Has Problem With The Use Of Multiple Gpus This Is My Trains-Agent Command: Trains-Agent Daemon --Gpus 0,1 --Queue Dual_Gpu --Docker --Foreground And It Gets The Error: Doc

Hi everyone, I am trying to use docker mode for trains-agent, but it seems that it has problem with the use of multiple gpus this is my trains-agent command:...

clearml

4 years ago

0 Votes

17 Answers

1K Views

0 Votes 17 Answers 1K Views

Hi Everyone, I Tried To Launch Experiments Using Conda With Different Cuda Versions, I Tried To Comment This Fields From The Trains.Conf File On The Remove Machine #Cuda_Version: 10.1 #Cudnn_Version: 7.0 But It Seems That When I Comment It (Like A

Hi everyone, I tried to launch experiments using conda with different cuda versions, I tried to comment this fields from the trains.conf file on the remove m...

clearml

4 years ago

0 Hi Everyone, I Am Trying To Use Docker Mode For Trains-Agent, But It Seems That It Has Problem With The Use Of Multiple Gpus This Is My Trains-Agent Command: Trains-Agent Daemon --Gpus 0,1 --Queue Dual_Gpu --Docker --Foreground And It Gets The Error: Doc

maybe it's possible to overcome this by setting NVIDIA_VISIBLE_DEVICES somehow, and then use --gpus all?

4 years ago

you are right, I have only 2 gpus right now, so basically I can launch --gpus all and it will work
but I want to create the scripts for longer use (deploy on larger machines with more gpus)

docker:
Client: Docker Engine - Community
Version: 19.03.6
API version: 1.40
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:27:49 2020
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
V...

4 years ago

0 Hi Everyone, I Tried To Launch Experiments Using Conda With Different Cuda Versions, I Tried To Comment This Fields From The Trains.Conf File On The Remove Machine #Cuda_Version: 10.1 #Cudnn_Version: 7.0 But It Seems That When I Comment It (Like A

when my system was "clean" I installed cuda 10.1 (never installed cuda 10.2) hope i'm not mistaken

4 years ago

yes, when I run docker itself
docker run --gpus '"device=0,1"' nvidia/cuda:10.1-base nvidia-smi

it work, but when I do with trains like WackyRabbit7 suggested (with same quotes):
trains-agent daemon --gpus '"device=0,1"' --queue dual_gpu --docker --foreground

it gives this error:
invalid argument "device="device=0,1"" for "--gpus" flag: parse error on line 1, column 7: bare " in non-quoted-field

4 years ago

The version of the cudatoolkit is 10.1 inside the experiment, and trains try to work with 10.2, probably because the same reason it displays in the nvidia-smi

4 years ago

Is it something that I can config from the call to task.init? (my goal is that I wont be required to change in manualy)

4 years ago

Didnt use it so far, but I will start 🙂

4 years ago

is the flow using dockers is more supported than conda? is there a guide regarding the configuration required for dockers?

4 years ago

got it thanks!
Is it possible to use different dockers (containing different cuda versions) in different experiments?
or I have to open different queues for that? (or something like that)

4 years ago

I can give it a shot (I'm using conda now) what is the overhead of going into dockers with the fact that I dont have "docker hands on experience"?

4 years ago

weird, I will try to find why is that

4 years ago

this is the error
Running Docker:

Executing: ('docker', 'run', '-t', '--gpus', 'device=0,1', '-e', 'TRAINS_WORKER_ID=lv-beast:gpu0,1', '-v', '/home/lv-beast/.git-credentials:/root/.git-credentials', '-v', '/home/lv-beast/.gitconfig:/root/.gitconfig', '-v', '/tmp/.trains_agent.li48l7ii.cfg:/root/trains.conf', '-v', '/tmp/trains_agent.ssh.uv6dxcw7:/root/.ssh', '-v', '/home/lv-beast/.trains/apt-cache.2:/var/cache/apt/archives', '-v', '/home/lv-beast/.trains/pip-cache:/root/.cache/pip', '-v', '/...

4 years ago

Hi TimelyPenguin76
you are right, it written cuda version 10.2 (even though I installed only cuda 10.1, weird)
do you know why it's 10.2?
and do you know why trains count on that? (instead of looking in the python environment of the executed script?)

4 years ago

thanks for the help!
I tried now:
trains-agent daemon --gpus "0,1" --queue dual_gpu --docker --foreground

but I get the same error when I execute train

4 years ago

what do you mean change?

4 years ago

0 Hi Everyone, I'M Trying To Execute Trains-Agent In Docker Mode With Conda As Package Manager, Is It Supported? I Tried To Work With Nvidia/Cuda:10.0-Runtime-Ubuntu18.04 And Got The Error "Trains_Agent: Error: Error: Package Manager "Conda" Selected, But '

thanks AgitatedDove14 , I will try to use docker with pip as package manager and see if it will solve my issues

4 years ago

ye I want especially python 3.7, I will try to get another docker with python 3.7 somehow

4 years ago

I use this docker nvidia/cuda:10.0-runtime-ubuntu18.04, I'm docker noob so far, so I will try to search, I assumed it installed python3.6 because it appears in the trains.conf
do you know if it just coming with python3.6?

4 years ago

I did, and it installed the docker with python 3.6 (I think because the parameter of agent.default_python is 3.6 by default)
is it possible to change this parameter when I create the experiment? (I want to work with python 3.7)

4 years ago

when I launch this:
(trains-agent) lv-beast@lv-beast:~/dev/MachineLearning/scripts/cmd_launcer$ docker run --gpus '"device=0,1"' nvidia/cuda:10.1-base nvidia-smi
it worked, so maybe its an issue with how trains pass the device to the docker run command?

4 years ago

WackyRabbit7 thanks for the suggestions
the first suggestion (without the quote) get the same result.
the second produce
invalid argument "device="device=0,1"" for "--gpus" flag: parse error on line 1, column 7: bare " in non-quoted-field
(this produce the execute command)
Executing: ('docker', 'run', '-t', '--gpus', 'device="device=0,1"', '-e', 'TRAINS_WORKER_ID=lv-beast:gpu"device=0,1"', '-v', '/home/lv-beast/.git-credentials:/root/.git-credentials', '-v', '/home/lv-beast/.gitconfig:/roo...

4 years ago