Also, what version are you on?
of what?
not sure I understand
runningclearml-agent list
I get
`
workers:
- company:
id: d1bd92...1e52b
name: clearml
id: clearml-server-...wdh:0
ip: x.x.x.x
... `
updated the clearml.conf
with empty worker_id/name ran
clearml-agent daemon --stop
top | grep clearmKilled the pidsran
clearml-agent list
still both of the workers are listed
Strange
I ranclearml-agent daemon --stop
and after 10 min I ranclearml-agent list
and I still see a worker
OutrageousSheep60 , what version of ClearML-Agent
are you using?
Can I assume you're running the agent (in daemon mode) on the same machine that you're running the clearml-agent daemon --stop
command?
by the way, if you stop a daemon in an orderly way, it should remove itself, I think...
Do you have any other workers running?
Hi OutrageousSheep60 , do you mean to make it disappear from the UI?
Did you wait 10-15~ mins for it to time out?
When you stop a daemon service, it will stop reporting to the server. There's a timeout of 10min, after which a daemon will not be displayed in the server
we reinstalled the clearml-agent$clearml-agent --version CLEARML-AGENT version 1.2.3
running top | grep clearml
we can see the agent running
running clearml-agent list
we can see 2 workers
before running clearml-agent daemon --stop
We updated the clearml.conf and updated the worker_id
and worker_name
with the relevant name/id that we can see from clearml-agent list
and we getCould not find a running clearml-agent instance with worker_name=<clearml_worker_name> worker_id=<clearml_worker_id:0>
As we understand the --stop
without any id's should stop all the workers.
waited 10 minrunning top | grep clearml
we can see the clearml-agent running
running clearml-agent list
we can see the 2 workers
yes - and removed fromclearml-agent list
Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
currently we are just creating a new worker and on a separate queue
If you killed all processes directly, there can't be any workers on that machine. It means that these two workers are running somewhere else...
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04Codename: focal
Can you try with blank worker_id/work_name in your clearml.conf
(basically how it was before)?
You can force kill the agent using kill -9 <process_id>
but clearml-agent daemon stop should work.
Also, can you verify that one of the daemons is the clearml-services daemon? This one should be running from inside a docker on your server machine (I'm guessing you're self hosting - correct?).
Also, can you verify that you still have the clearml-agent process running? top
/ htop
Can you try upgrading to the latest agent version? pip install -U clearml-agent
Please advise on how to remove a worker