Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one

  
  
Posted 4 years ago
Votes Newest

Answers 25


I think this one is on us, I don't think a search would have led you to the correct answer ...
I'll try to make sure they add something regrading the configuration 🙂

  
  
Posted 4 years ago

We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)

  
  
Posted 4 years ago

perfect!

  
  
Posted 4 years ago

🙂

  
  
Posted 4 years ago

not sure what is the "right way" 🙂
But I do pkill -f "trains-agent --gpus 0" This will kill a process that started "trains-agent --gpus 0" Notice it matches the cmd pattern so it has to match the way you executed the agent. You can check it with ps -Af | grep trains-agent

  
  
Posted 4 years ago

DilapidatedDucks58 no don't say that, you are wonderful 😉

trains-agent --gpus 0 --queue my_queue -d
should create a worker machine:gpu0
Then you can do trains-agent --gpus 1 --queue my_queue -d which will create machine:gpu1

  
  
Posted 4 years ago

thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section

  
  
Posted 4 years ago

the weird part is that the old job continues running when I recreate the worker and enqueue the new job

  
  
Posted 4 years ago

TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
and
TRAINS_WORKER_NAME=second_agent trains-agent --gpus 0

  
  
Posted 4 years ago

Ohhhh , okay as long as you know, they might fall on memory...

  
  
Posted 4 years ago

Ups, you misunderstood me. I just want to remove specified agent. For example, I had 3 agents on the same queue with different worker names. So, if I remove them by applying what you said in this thread, all of them will be removed. However, I just want to remove one of them.

  
  
Posted 4 years ago

our GPUs are 48GB, so it's quite wasteful to only run one job per GPU
yeah, I'm aware of that, I would have to make sure they don't fail to infamous CUDA out of memory, but still

  
  
Posted 4 years ago

AgitatedDove14 Is it possible to delete specified worker? I mean, I have 10 workers and I want to delete one of them?

  
  
Posted 4 years ago

Ohh now I get it...
Wait a couple of hours, 0.16 is out today with trains-agent --stop flag 🙂

  
  
Posted 4 years ago

let me check

  
  
Posted 4 years ago

another stupid question - what is the proper way to delete a worker? so far I've been using pgrep to find the relevant PID 😃

  
  
Posted 4 years ago

that's right, I have 4 GPUs and 4 workers. but what if I want to run two jobs simultaneously at the same GPU

  
  
Posted 4 years ago

well okay, it's probably not that weird considering that worker just runs the container

  
  
Posted 4 years ago

You mean why you have two processes ?

  
  
Posted 4 years ago

is it in documentation somewhere?

  
  
Posted 4 years ago

MysteriousBee56 , The agent is not running on the "server" it's running on its machine.
The server just reflects the fact he agent is up..
To actually take it down you need to SSH (or connect to that machine) and stop the actual trains-agent process.
What is exactly the scenario you had in mind?

  
  
Posted 4 years ago

Yes, I mean removing agent from the server

  
  
Posted 4 years ago

MysteriousBee56 what do you mean "delete a worker"
stop the agent running remotely ?

  
  
Posted 4 years ago
1K Views
25 Answers
4 years ago
one year ago
Tags