Reputation
Badges 1
100 × Eureka!Let me update you about this in a couple of minutes
Oof, if all I have is a project bame to set? (Which could be a non existing project as well)
Brutal sudo reboot, the agent is not up anymore
it uses the api credentials generated by the trains dashboard
Yep. Works as you said.
or do you mean it tries to apply the already ran experiment's uncommitted changes? If that's the case, why did the new experiment fail if the previous experiment ran successfully?
Hey, I've gotten this message:
TRAINS Task: overwriting (reusing) task id=24ac52461b2d4cfa9e672d9cd817962c
And I'm not sure why it's reusing the task instead of creating a new task id, the configuration was different although the same python file run. Have you got any idea?
Since my servers have a shared file system, the init process tells me that the configuration file already exists. Can I tell it to place it in another location? GrumpyPenguin23
Nevermind, you can find it in the apiserver.conf
Furthermore, let's say I have 6 GPUs on a machine, and I'd like trains to treat this machine as 2 workers (gpus 0-2, 3-5), is there a way to do that?
I've sorted this out. All I needed was to add them to a queue so they would be visible.
Found it in the init docs 🙂
I aborted the task because of a bug on my side
Changing the mountpoint for the agent is not possible
Well the original task is run with my user
If I'd be exact that's a trains agent task that creates in a new subprocess another trains agent task
let me try
It's important to say that this happens when I have more than like 4 workers but when I run thetrains-agent daemon --stop
With less than 4 workers it works well