MysteriousBee56 , The agent is not running on the "server" it's running on its machine.
The server just reflects the fact he agent is up..
To actually take it down you need to SSH (or connect to that machine) and stop the actual trains-agent process.
What is exactly the scenario you had in mind?
MysteriousBee56 what do you mean "delete a worker"
stop the agent running remotely ?
Yes, I mean removing agent from the server
DilapidatedDucks58 no don't say that, you are wonderful 😉
trains-agent --gpus 0 --queue my_queue -d
should create a worker
Then you can do
trains-agent --gpus 1 --queue my_queue -d which will create
our GPUs are 48GB, so it's quite wasteful to only run one job per GPU
yeah, I'm aware of that, I would have to make sure they don't fail to infamous CUDA out of memory, but still
that's right, I have 4 GPUs and 4 workers. but what if I want to run two jobs simultaneously at the same GPU
the weird part is that the old job continues running when I recreate the worker and enqueue the new job
well okay, it's probably not that weird considering that worker just runs the container
Ohhhh , okay as long as you know, they might fall on memory...
You mean why you have two processes ?
TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
TRAINS_WORKER_NAME=second_agent trains-agent --gpus 0
We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)
thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section
is it in documentation somewhere?
let me check
not sure what is the "right way" 🙂
But I do
pkill -f "trains-agent --gpus 0" This will kill a process that started "trains-agent --gpus 0" Notice it matches the cmd pattern so it has to match the way you executed the agent. You can check it with
ps -Af | grep trains-agent
I think this one is on us, I don't think a search would have led you to the correct answer ...
I'll try to make sure they add something regrading the configuration 🙂
another stupid question - what is the proper way to delete a worker? so far I've been using pgrep to find the relevant PID 😃
AgitatedDove14 Is it possible to delete specified worker? I mean, I have 10 workers and I want to delete one of them?
Ups, you misunderstood me. I just want to remove specified agent. For example, I had 3 agents on the same queue with different worker names. So, if I remove them by applying what you said in this thread, all of them will be removed. However, I just want to remove one of them.
Ohh now I get it...
Wait a couple of hours, 0.16 is out today with trains-agent --stop flag 🙂