Since my servers have a shared file system, the init process tells me that the configuration file already exists. Can I tell it to place it in another location? GrumpyPenguin23
I've sorted this out. All I needed was to add them to a queue so they would be visible.
Something else, If I want to designate only some of the GPUs of a worker, how can I do that?
Check the examples on the github page, I think this is what you are looking for 🙂
https://github.com/allegroai/trains-agent#running-the-trains-agent
Hmm, I've changed my trains-server config location to use a config in a different location, and successfully set up in the second server the trains-agent. But I don't see any new worker created, why is that?
SmarmySeaurchin8 just so that I don't miss anything.
One machine, two trains-agents each one connected to a different trains-server, correct ?
from the trains-agent --help
trains-agent --config-file /home/user/my_trains_server1.conf daemon trains-agent --config-file /home/user/my_trains_server2.conf daemon
Furthermore, let's say I have 6 GPUs on a machine, and I'd like trains to treat this machine as 2 workers (gpus 0-2, 3-5), is there a way to do that?
Welcome! The machines are the ones you install and run the trains-agent daemon on, and creating the queues can be done via the trains-agent cli or the webapp UI
Hey, I've gotten this message:
TRAINS Task: overwriting (reusing) task id=24ac52461b2d4cfa9e672d9cd817962c
And I'm not sure why it's reusing the task instead of creating a new task id, the configuration was different although the same python file run. Have you got any idea?
We are here if you need further help 🙂
Actually two machines with shared filesystem
Hi SmarmySeaurchin8 , you can point to any configuration file by setting the environment variable:TRAINS_CONFIG_FILE=/home/user/my_trains.conf