It worked when I changed python3 -m trains-agent --help
to trains-agent --help
I suspect it's the localhost - and the trains-agent is trying too hard to access the port, but for some reason does not report an error ...
AgitatedDove14 I might find something to fix the issue but I am not sure. In trains-agent worker.py script log it is written like that python3 -u -m trains_agent execute --disable-monitoring --id 9fe6d610a2b946379255b0fc25b5f9fd')
so at the end there is an extra " ' ". So when I run this script in my local environment by writing python3 -u -m trains_agent execute --disable-monitoring --id 9fe6d610a2b946379255b0fc25b5f9fd
it works and runs the code. However, if I write python3 -u -m trains_agent execute --disable-monitoring --id 9fe6d610a2b946379255b0fc25b5f9fd'
it waits for a string. However, I could not find where '
it comes.
BTW, we figure out that
'
is belong the echo
yep, when seeing the full command it is apparent
How can I change the apiserver from localhost to my machine's IP. I couldn't figure it out. Sorry.
Btw, we figure out that '
is belong the echo. So there is no problem with that one.
ohh right, my bad:docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && pip install trains-agent && echo done"
I was just able to reproduce with "localhost"
AgitatedDove14 Yes, that's what I am saying.
MysteriousBee56 Edit in your ~/trains.conf:api_server:
http://localhost:8008
toapi_server:
http://192.168.1.11:8008
and obliviously the same for web & files
I'll make sure we fix the trains-agent to output an error message instead of trying to silently keep accessing the API server
Getting you machine ip:
just run :ifconfig | grep 'inet addr:'
Then you should see a bunch of lines, pick the one that does not start with 127 or 172
Then to verify runping <my_ip_here>
MysteriousBee56 and please this one: "when you run the trains-agent
with --foreground , before it starts the docker it print the full command line"
MysteriousBee56 Okay, let's try this one:docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo done"
Okay that might explain the issue...
MysteriousBee56 so what you are saying ispython3 -m trains-agent --help
does NOT work
but trains-agent --help
does work?
Hmmm.
could you change the api_server:
http://localhost:8008 to your host IP?
for example:api_server:
http://192.168.1.11:8008
MysteriousBee56 that is so weird ... last one, I promise 🙂docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo \$(which python3) && echo \$(which trains-agent)"
MysteriousBee56 when you run the trains-agent
with --foreground , before it starts the docker it print the full command line, could you send it please?
I can't figure out where the extra ' came from...
Also could you send the trains.conf file?
(feel free to redact and confidential information)
At the end, there is an error about "pip"
I'm checking now to see where the extra ' could come from
/usr/bin/python3: No module named trains-agent
Because of the error I thought, I run first command, but I run edited version. It gives this error (That's why, it takes time 😞 )
Yey! MysteriousBee56 kudos on keep trying!
I'll make sure we report those errors, because this debug process should have much shorter 🙂
Okay now let's try: EDITdocker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && python3 -m trains-agent --help"
AgitatedDove14 /usr/bin/python3
/usr/local/bin/trains-agent
MysteriousBee56 not a different port, just not with "localhost" but with your machine's IP
MysteriousBee56 that is very strange definitely explains it, kudos on debugging it !!!
chown: cannot access '/root/.cache/pip': No such file or directory
It gives this error