Still trying to understand what is this default worker.
clearml.conf and reinstall
then running the
clearml-agent listgets the following error
` Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing
~/clearml.conf or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own
clearml-server, or create a free account at
Then returning the clearml.conf
, and running clearml-agent list
we get - company:
- id: 74794fe91f70452eb7149c34cc39315a
how was this worker started? BTW - the apicredentials
is of a specific user (and not user namedtests ` )
Further investigation showed that the worker was created with a dedicated
CLEARML_HOST_IP - so running the
clearml-agent daemon --stop
didn't kill it (but it did appear in the
But once we added the CLEARML_HOST_IP `
CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --stop
it finally killed it
I think I have a lead.
looking at list of workers from
clearml-agent list e.g. https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
is there a way to find the
in the above example the
clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 but I'm not able to stop this worker using the command
clearml-agent daemon --stop
since this orphan worker has no corresponding
Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
we are not able to reference this orphan worker (it does not show up with
ps -ef | grep clearml-agent -
but still appears with
and not able to stop with
clearml-agent daemon --stop clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
Could not find a running clearml-agent instance with worker_name=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 worker_id=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
However - if we create a different
worker we are able to use it and clone the repo. e.g.
CLEARML_WORKER_NAME=my_worker CLEARML_WORKER_ID=my_worker clearml-agent daemon --detached --queue my_queue
is this running from the same linux user on which you checked the git ssh clone on that machine?
The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder
Question - if we change the
clearml.conf do we need to stop and start the daemon?
so running the command
clearml-agent -d list returns the https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
Hi SweetBadger76 ,
Well - apparently I was mistaken.
I still have a ghost worker that i'm mot able to remove (I had 2 workers on the same queue - that caused my confusion).
I can see it in the UI and when I run
And although I'm stoping the worker specifically
clearml-agent daemon --stop <worker_id>I'm getting
Could not find a running clearml-agent instance with worker_name=<worker_id> worker_id=<worker_id>
The worker name is part of the key, so
worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 means the worker name in this case is
i am not sure i get you here.
when pip installing clearml-agent, it doesnt fire any agent. the procedure is that after having installed the package, if there isnt any config file, you do
clearml-agent init and you enter the credentials, which are stored in clearml.conf. If there is a conf file, you simply edit it and manually enter the credentials. so i dont understand what you mean by "remove it"