Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi We Are Getting The Following Error When We Are Trying To Run A Task On Our On Premis

Hi
We are getting the following error when we are trying to run a task on our on premis clearml-agent ( version 1.3.0)
cloning: git@github.com:XXXX/sample.repo.git Host key verification failed. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.In order to verify that the ssh key is valid we removed the ssh key from the github and ran the code
git clone git@github.com:XXXX/sample.repo.gitand as expected we got an error
` Cloning into 'sample.repo'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists. We added the ssh key to github and successfully mange to clone the repo running git clone git@github.com:XXXX/sample.repo.git Our clearml.conf file has the following configuration git_user=""
git_pass=""
git_host=""
force_git_ssh_protocol: true and running clearml-agent list ` returns with a worker.

Any suggestions would be appreciated

  
  
Posted 2 years ago
Votes Newest

Answers 30


hi OutrageousSheep60
sounds like the agent is in reality ... dead. It sounds logical, because you cannot see it using ps
however, it would worth to check if you still can see it in the UI

  
  
Posted 2 years ago

i am not sure i get you here.
when pip installing clearml-agent, it doesnt fire any agent. the procedure is that after having installed the package, if there isnt any config file, you do clearml-agent init and you enter the credentials, which are stored in clearml.conf. If there is a conf file, you simply edit it and manually enter the credentials. so i dont understand what you mean by "remove it"

  
  
Posted 2 years ago

hey OutrageousSheep60
what about the process ? there must be one clearml-agent process that runs somwhere, and that is why it can continue reporting to the server

  
  
Posted 2 years ago

Thx for your reply

  
  
Posted 2 years ago

agree -
we understand now that the worker is the default worker that is installed after running
pip install clearml-agentis it possible to remove it ? since all tasks that use the worker don't have the correct credentials.

  
  
Posted 2 years ago

Sorry -
After updating the repo I can see that the newest chart is 4.1.1
SweetBadger76 should I update to this version?

  
  
Posted 2 years ago

Here is the screenshot - we deleted all the workers - accept for the one that we couldn't

  
  
Posted 2 years ago

Question - if we change the

clearml.conf

do we need to stop and start the daemon?

yes

  
  
Posted 2 years ago

not sure i understand
we are running the daemon in a detached mode

clearml-agent daemon --queue <execution_queue_to_pull_from> --detached

  
  
Posted 2 years ago

btw can you screenshot your clearml-agent list and UI please ?

  
  
Posted 2 years ago

and the command you're using to run the agent?

  
  
Posted 2 years ago

clearml-3.5.0

  
  
Posted 2 years ago

JuicyFox94

  
  
Posted 2 years ago

Hi SweetBadger76 -
I'm I misunderstanding how this tests worker runs?

  
  
Posted 2 years ago

The worker name is part of the key, so worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 means the worker name in this case is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0

  
  
Posted 2 years ago

how are you deploying your server?

  
  
Posted 2 years ago

Sorry - I'm a Helm newbee
when running
helm search repo clearml --versionsI can't see version 3.6.2 - the highest is 3.5.0
This is the repo that we used to get the helm chart
helm repo add allegroaiWhat I'm I missing?

  
  
Posted 2 years ago

Hi SweetBadger76
Further investigation showed that the worker was created with a dedicated CLEARML_HOST_IP - so running the

clearml-agent daemon --stop

didn't kill it (but it did appear in the clearml-agent list But once we added the CLEARML_HOST_IP `

CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --stop

it finally killed it

  
  
Posted 2 years ago

OutrageousSheep60 it looks to me this agent is part of the server's deployment

  
  
Posted 2 years ago

Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
we are not able to reference this orphan worker (it does not show up with ps -ef | grep clearml-agent -
but still appears with clearml-agent list
and not able to stop with clearml-agent daemon --stop clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
getting
Could not find a running clearml-agent instance with worker_name=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 worker_id=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
However - if we create a different worker we are able to use it and clone the repo. e.g.
CLEARML_WORKER_NAME=my_worker CLEARML_WORKER_ID=my_worker clearml-agent daemon --detached --queue my_queue

  
  
Posted 2 years ago

can you try again after having upgraded to 3.6.2 ?

  
  
Posted 2 years ago

is this running from the same linux user on which you checked the git ssh clone on that machine? The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder

  
  
Posted 2 years ago

is this running from the same linux user on which you checked the git ssh clone on that machine?

yes

The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder

maybe -
Question - if we change the clearml.conf do we need to stop and start the daemon?

  
  
Posted 2 years ago

latest version? only the clearml chart?

  
  
Posted 2 years ago

Still trying to understand what is this default worker.
I've removed clearml.conf and reinstall clearml-agent
then running the
clearml-agent listgets the following error
` Using built-in ClearML default key/secret

clearml_agent: ERROR: Could not find host server definition (missing ~/clearml.conf or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own clearml-server, or create a free account at and run clearml-agent init Then returning the clearml.conf , and running clearml-agent list we get - company:
id: d1bd92a3b039400cbafc60a7a5b1e52b
name: clearml
id: clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
ip: 10.124.0.4
key: worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
last_activity_time: '2022-07-13T09:37:31.718067+00:00'
last_report_time: '2022-07-13T09:37:31.718067+00:00'
queues:

  • id: 74794fe91f70452eb7149c34cc39315a
    name: default
    num_tasks: 0
    register_time: '2022-07-01T23:39:00.733133+00:00'
    register_timeout: 600
    tags: []
    user:
    id: tests
    name: tests how was this worker started? BTW - the api credentials in the clearml.conf is of a specific user (and not user named tests ` )
  
  
Posted 2 years ago

Hi SweetBadger76 ,
Well - apparently I was mistaken.
I still have a ghost worker that i'm mot able to remove (I had 2 workers on the same queue - that caused my confusion).
I can see it in the UI and when I run clearml-agent list
And although I'm stoping the worker specifically
clearml-agent daemon --stop <worker_id>I'm getting
Could not find a running clearml-agent instance with worker_name=<worker_id> worker_id=<worker_id>

  
  
Posted 2 years ago

I think I have a lead.
looking at list of workers from clearml-agent list e.g. https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
is there a way to find the worker_name ?
in the above example the worker_id is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 but I'm not able to stop this worker using the command

clearml-agent daemon --stop

since this orphan worker has no corresponding clearml.conf

  
  
Posted 2 years ago

Yeah, that's what I was looking for 🙂

  
  
Posted 2 years ago
1K Views
30 Answers
2 years ago
one year ago
Tags
Similar posts