there is a funny issue with trains, one of the great features in our book is the fact that you pickup tensorboard logs automatically, but you group them in the opposite direction, i.e. if i have:
It is a VM running Ubuntu 18.04, yes i ended up giving it 8 GB which seemed to solve the issue. Pretty common to run servers on VMs these days ... :)
Thanks ! thats great, also can i some how make sure that no matter what results are not uploaded to the public demo server ?
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
i have actually already tried to follow those instructions, after a fresh install of the OS
You need to change a setting in your host machine to make the elasticsearch working.
which changes do i need to make to get elastic search to work ?
File "/opt/conda/lib/python3.6/site-packages/trains/task.py", line 277, in init not auto_connect_frameworks.get('detect_repository', True)) else True File "/opt/conda/lib/python3.6/site-packages/trains/task.py", line 1163, in _create_dev_task log_to_backend=True, File "/opt/conda/lib/python3.6/site-packages/trains/task.py", line 111, in __init__ super(Task, self).__init__(**kwargs) File "/opt/conda/lib/python3.6/site-packages/trains/backend_interface/task/task.py", line 108, in __init__ self.id = self._auto_generate(project_name=project_name, task_name=task_name, task_type=task_type) File "/opt/conda/lib/python3.6/site-packages/trains/backend_interface/task/task.py", line 251, in _auto_generate created_msg = make_message('Auto-generated at %(time)s by %(user)s@%(host)s') File "/opt/conda/lib/python3.6/site-packages/trains/backend_interface/util.py", line 28, in make_message user=getpass.getuser(), File "/opt/conda/lib/python3.6/getpass.py", line 169, in getuser return pwd.getpwuid(os.getuid())[0] KeyError: 'getpwuid(): uid not found: 10001'
no, we have a vmware server, on it we run a bunch of servers. While I have your attention, I'm running into a new issue, most of our training sessions run from inside a docker. When i try to run such a training session, i get an error about the user:
CourageousLizard33 VM?! I thought we are talking fresh install on ubuntu 18.04?!
Is the Ubuntu in a VM? If so, I'm pretty sure 8GB will do, maybe less, but I haven't checked.
How much did you end up giving it?
It worked ! took me a while to get the docker "user" to pick up trains.conf ...
CourageousLizard33 if the two series are on the same graph, just click on the series in the legend, you can enable/disable it, and the scale will adjust automatically.
Regarding grouping, this is a feature that can be turned off, the idea is that we split the tag to title/series... So if you have the same prefix you get to group the TF scalars on the same graph, otherwise they will be on a diff title graph. That said you can make force it to have a series per graph like in TB. Makes sense?
there will be a tr but there will be a separate graph for top1 and loss, on your system then go into the same graph, since loss and train accuracy usually have very diffrent value ranges, it make it impossible to see the loss graph without starting go manipulate it
SteadyFox10 I suspect you are correct 🙂
CourageousLizard33 see also section (4) here:
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md#launching-the-trains-server-docker-in-linux-or-macos
:) yes on your gateway/firewall set http://demoapi.trains.allegro.ai to 127.0.0.1 . That's always good practice ;)
CourageousLizard33 Are you using the docker-compose to setup the trains-server?
logger.log_metric('tr.top1', to_python_float(prec1))
i need to run the docker with my uid which is 10001 but the docker does not know or have the user, why does it need it ? to find the trains.conf ? is there any way to pass it manually ?
OK what solved it is increasing the RAM of the VM, do you specify minimum requirements anywhere ?
CourageousLizard33 specifically section (4) is the issue (and it's related to any elastic docker, nothing specific to trains-server)echo "vm.max_map_count=262144" > /tmp/99-trains.conf sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf sudo sysctl -w vm.max_map_count=262144 sudo service docker restart
Did you try the above, and you are still getting the same error ?
CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs