MysteriousBee56 when you run the trains-agent
with --foreground , before it starts the docker it print the full command line, could you send it please?
I can't figure out where the extra ' came from...
Also could you send the trains.conf file?
(feel free to redact and confidential information)
How can I change the apiserver from localhost to my machine's IP. I couldn't figure it out. Sorry.
Hmmm.
could you change the api_server:
http://localhost:8008 to your host IP?
for example:api_server:
http://192.168.1.11:8008
AgitatedDove14 /usr/bin/python3
/usr/local/bin/trains-agent
ohh right, my bad:docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && pip install trains-agent && echo done"
/usr/bin/python3: No module named trains-agent
Because of the error I thought, I run first command, but I run edited version. It gives this error (That's why, it takes time 😞 )
MysteriousBee56 Okay, let's try this one:docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo done"
It worked when I changed python3 -m trains-agent --help
to trains-agent --help
Yey! MysteriousBee56 kudos on keep trying!
I'll make sure we report those errors, because this debug process should have much shorter 🙂
MysteriousBee56 that is very strange definitely explains it, kudos on debugging it !!!
MysteriousBee56 that is so weird ... last one, I promise 🙂docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo \$(which python3) && echo \$(which trains-agent)"
Okay now let's try: EDITdocker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && python3 -m trains-agent --help"
AgitatedDove14 Yes, that's what I am saying.
BTW, we figure out that Â
'
 is belong the echo
yep, when seeing the full command it is apparent
MysteriousBee56 and please this one: "when you run the trains-agent
 with --foreground , before it starts the docker it print the full command line"
I'm checking now to see where the extra ' could come from
AgitatedDove14 I might find something to fix the issue but I am not sure. In trains-agent worker.py script log it is written like that python3 -u -m trains_agent execute --disable-monitoring --id 9fe6d610a2b946379255b0fc25b5f9fd')
so at the end there is an extra " ' ". So when I run this script in my local environment by writing python3 -u -m trains_agent execute --disable-monitoring --id 9fe6d610a2b946379255b0fc25b5f9fd
it works and runs the code. However, if I write python3 -u -m trains_agent execute --disable-monitoring --id 9fe6d610a2b946379255b0fc25b5f9fd'
it waits for a string. However, I could not find where '
 it comes.
MysteriousBee56 not a different port, just not with "localhost" but with your machine's IP
Okay that might explain the issue...
MysteriousBee56 so what you are saying ispython3 -m trains-agent --help
does NOT work
but trains-agent --help
does work?
At the end, there is an error about "pip"
chown: cannot access '/root/.cache/pip': No such file or directory
It gives this error
I suspect it's the localhost - and the trains-agent is trying too hard to access the port, but for some reason does not report an error ...
Btw, we figure out that '
is belong the echo. So there is no problem with that one.
MysteriousBee56 Edit in your ~/trains.conf:api_server:
http://localhost:8008
toapi_server:
http://192.168.1.11:8008
and obliviously the same for web & files
I'll make sure we fix the trains-agent to output an error message instead of trying to silently keep accessing the API server
Getting you machine ip:
just run :ifconfig | grep 'inet addr:'
Then you should see a bunch of lines, pick the one that does not start with 127 or 172
Then to verify runping <my_ip_here>
I was just able to reproduce with "localhost"