Can I assume you're running the agent (in daemon mode) on the same machine that you're running the clearml-agent daemon --stop
command?
Please advise on how to remove a worker
If you killed all processes directly, there can't be any workers on that machine. It means that these two workers are running somewhere else...
Can you try upgrading to the latest agent version? pip install -U clearml-agent
Also, can you verify that you still have the clearml-agent process running? top
/ htop
we reinstalled the clearml-agent$clearml-agent --version CLEARML-AGENT version 1.2.3
running top | grep clearml
we can see the agent running
running clearml-agent list
we can see 2 workers
before running clearml-agent daemon --stop
We updated the clearml.conf and updated the worker_id
and worker_name
with the relevant name/id that we can see from clearml-agent list
and we getCould not find a running clearml-agent instance with worker_name=<clearml_worker_name> worker_id=<clearml_worker_id:0>
As we understand the --stop
without any id's should stop all the workers.
waited 10 minrunning top | grep clearml
we can see the clearml-agent running
running clearml-agent list
we can see the 2 workers
When you stop a daemon service, it will stop reporting to the server. There's a timeout of 10min, after which a daemon will not be displayed in the server
Strange
I ranclearml-agent daemon --stop
and after 10 min I ranclearml-agent list
and I still see a worker
OutrageousSheep60 , what version of ClearML-Agent
are you using?
yes - and removed fromclearml-agent list
Do you have any other workers running?
Hi OutrageousSheep60 , do you mean to make it disappear from the UI?
Also, what version are you on?
of what?
by the way, if you stop a daemon in an orderly way, it should remove itself, I think...
Can you try with blank worker_id/work_name in your clearml.conf
(basically how it was before)?
You can force kill the agent using kill -9 <process_id>
but clearml-agent daemon stop should work.
Also, can you verify that one of the daemons is the clearml-services daemon? This one should be running from inside a docker on your server machine (I'm guessing you're self hosting - correct?).
updated the clearml.conf
with empty worker_id/name ran
clearml-agent daemon --stop
top | grep clearmKilled the pidsran
clearml-agent list
still both of the workers are listed
not sure I understand
runningclearml-agent list
I get
`
workers:
- company:
id: d1bd92...1e52b
name: clearml
id: clearml-server-...wdh:0
ip: x.x.x.x
... `
Did you wait 10-15~ mins for it to time out?
Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
currently we are just creating a new worker and on a separate queue
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04Codename: focal