Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

Answered

Hi, I'm trying to install a new server, this is a fresh ubuntu 18.04 install. when i try to run the docker composer up command i get error messages like this one:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: /_template/events_plot (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9897136c50>: Failed to establish a new connection: [Errno 111] Connection refused',))Any tips ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

Votes Newest

Answers 27

OK what solved it is increasing the RAM of the VM, do you specify minimum requirements anywhere ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

there is a funny issue with trains, one of the great features in our book is the fact that you pickup tensorboard logs automatically, but you group them in the opposite direction, i.e. if i have:

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

Probably less secure though :)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

there will be a tr but there will be a separate graph for top1 and loss, on your system then go into the same graph, since loss and train accuracy usually have very diffrent value ranges, it make it impossible to see the loss graph without starting go manipulate it

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

CourageousLizard33 VM?! I thought we are talking fresh install on ubuntu 18.04?!
Is the Ubuntu in a VM? If so, I'm pretty sure 8GB will do, maybe less, but I haven't checked.
How much did you end up giving it?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

CourageousLizard33 Are you using the docker-compose to setup the trains-server?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It is a VM running Ubuntu 18.04, yes i ended up giving it 8 GB which seemed to solve the issue. Pretty common to run servers on VMs these days ... :)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

CourageousLizard33 specifically section (4) is the issue (and it's related to any elastic docker, nothing specific to trains-server)
echo "vm.max_map_count=262144" > /tmp/99-trains.conf sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf sudo sysctl -w vm.max_map_count=262144 sudo service docker restartDid you try the above, and you are still getting the same error ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks ! thats great, also can i some how make sure that no matter what results are not uploaded to the public demo server ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

Hi, yes its a docker on a VM

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

SteadyFox10 I suspect you are correct 🙂
CourageousLizard33 see also section (4) here:
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md#launching-the-trains-server-docker-in-linux-or-macos

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Where are you seeing this message?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I did, but i will try again

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

:) yes on your gateway/firewall set http://demoapi.trains.allegro.ai to 127.0.0.1 . That's always good practice ;)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

File "/opt/conda/lib/python3.6/site-packages/trains/task.py", line 277, in init not auto_connect_frameworks.get('detect_repository', True)) else True File "/opt/conda/lib/python3.6/site-packages/trains/task.py", line 1163, in _create_dev_task log_to_backend=True, File "/opt/conda/lib/python3.6/site-packages/trains/task.py", line 111, in __init__ super(Task, self).__init__(**kwargs) File "/opt/conda/lib/python3.6/site-packages/trains/backend_interface/task/task.py", line 108, in __init__ self.id = self._auto_generate(project_name=project_name, task_name=task_name, task_type=task_type) File "/opt/conda/lib/python3.6/site-packages/trains/backend_interface/task/task.py", line 251, in _auto_generate created_msg = make_message('Auto-generated at %(time)s by %(user)s@%(host)s') File "/opt/conda/lib/python3.6/site-packages/trains/backend_interface/util.py", line 28, in make_message user=getpass.getuser(), File "/opt/conda/lib/python3.6/getpass.py", line 169, in getuser return pwd.getpwuid(os.getuid())[0] KeyError: 'getpwuid(): uid not found: 10001'

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

i have actually already tried to follow those instructions, after a fresh install of the OS

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

You need to change a setting in your host machine to make the elasticsearch working.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SteadyFox10
				
					0
					 × 1

no, we have a vmware server, on it we run a bunch of servers. While I have your attention, I'm running into a new issue, most of our training sessions run from inside a docker. When i try to run such a training session, i get an error about the user:

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

It worked ! took me a while to get the docker "user" to pick up trains.conf ...

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

thanks

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

which changes do i need to make to get elastic search to work ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

CourageousLizard33 if the two series are on the same graph, just click on the series in the legend, you can enable/disable it, and the scale will adjust automatically.
Regarding grouping, this is a feature that can be turned off, the idea is that we split the tag to title/series... So if you have the same prefix you get to group the TF scalars on the same graph, otherwise they will be on a diff title graph. That said you can make force it to have a series per graph like in TB. Makes sense?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

i need to run the docker with my uid which is 10001 but the docker does not know or have the user, why does it need it ? to find the trains.conf ? is there any way to pass it manually ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

https://github.com/allegroai/trains/blob/master/docs/trains.conf#L47

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

logger.log_metric('tr.top1', to_python_float(prec1))

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CourageousLizard33
				
					0
					 × 1

Write your answer

2K Views

27 Answers

5 years ago

2 years ago