Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi We Are Getting The Following Error When We Are Trying To Run A Task On Our On Premis

Hi
We are getting the following error when we are trying to run a task on our on premis clearml-agent ( version 1.3.0)
cloning: git@github.com:XXXX/sample.repo.git Host key verification failed. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.In order to verify that the ssh key is valid we removed the ssh key from the github and ran the code
git clone git@github.com:XXXX/sample.repo.gitand as expected we got an error
` Cloning into 'sample.repo'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists. We added the ssh key to github and successfully mange to clone the repo running git clone git@github.com:XXXX/sample.repo.git Our clearml.conf file has the following configuration git_user=""
git_pass=""
git_host=""
force_git_ssh_protocol: true and running clearml-agent list ` returns with a worker.

Any suggestions would be appreciated

  
  
Posted one year ago
Votes Newest

Answers 30


is this running from the same linux user on which you checked the git ssh clone on that machine?

yes

The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder

maybe -
Question - if we change the clearml.conf do we need to stop and start the daemon?

  
  
Posted one year ago

is this running from the same linux user on which you checked the git ssh clone on that machine? The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder

  
  
Posted one year ago

and the command you're using to run the agent?

  
  
Posted one year ago

Yeah, that's what I was looking for 🙂

  
  
Posted one year ago

Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
we are not able to reference this orphan worker (it does not show up with ps -ef | grep clearml-agent -
but still appears with clearml-agent list
and not able to stop with clearml-agent daemon --stop clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
getting
Could not find a running clearml-agent instance with worker_name=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 worker_id=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
However - if we create a different worker we are able to use it and clone the repo. e.g.
CLEARML_WORKER_NAME=my_worker CLEARML_WORKER_ID=my_worker clearml-agent daemon --detached --queue my_queue

  
  
Posted one year ago

I think I have a lead.
looking at list of workers from clearml-agent list e.g. https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
is there a way to find the worker_name ?
in the above example the worker_id is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 but I'm not able to stop this worker using the command

clearml-agent daemon --stop

since this orphan worker has no corresponding clearml.conf

  
  
Posted one year ago

Hi SweetBadger76 ,
Well - apparently I was mistaken.
I still have a ghost worker that i'm mot able to remove (I had 2 workers on the same queue - that caused my confusion).
I can see it in the UI and when I run clearml-agent list
And although I'm stoping the worker specifically
clearml-agent daemon --stop <worker_id>I'm getting
Could not find a running clearml-agent instance with worker_name=<worker_id> worker_id=<worker_id>

  
  
Posted one year ago

hi OutrageousSheep60
sounds like the agent is in reality ... dead. It sounds logical, because you cannot see it using ps
however, it would worth to check if you still can see it in the UI

  
  
Posted one year ago

Hi SweetBadger76
Further investigation showed that the worker was created with a dedicated CLEARML_HOST_IP - so running the

clearml-agent daemon --stop

didn't kill it (but it did appear in the clearml-agent list But once we added the CLEARML_HOST_IP `

CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --stop

it finally killed it

  
  
Posted one year ago

The worker name is part of the key, so worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 means the worker name in this case is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0

  
  
Posted one year ago

hey OutrageousSheep60
what about the process ? there must be one clearml-agent process that runs somwhere, and that is why it can continue reporting to the server

  
  
Posted one year ago

Thx for your reply

  
  
Posted one year ago

Still trying to understand what is this default worker.
I've removed clearml.conf and reinstall clearml-agent
then running the
clearml-agent listgets the following error
` Using built-in ClearML default key/secret

clearml_agent: ERROR: Could not find host server definition (missing ~/clearml.conf or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own clearml-server, or create a free account at and run clearml-agent init Then returning the clearml.conf , and running clearml-agent list we get - company:
id: d1bd92a3b039400cbafc60a7a5b1e52b
name: clearml
id: clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
ip: 10.124.0.4
key: worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
last_activity_time: '2022-07-13T09:37:31.718067+00:00'
last_report_time: '2022-07-13T09:37:31.718067+00:00'
queues:

  • id: 74794fe91f70452eb7149c34cc39315a
    name: default
    num_tasks: 0
    register_time: '2022-07-01T23:39:00.733133+00:00'
    register_timeout: 600
    tags: []
    user:
    id: tests
    name: tests how was this worker started? BTW - the api credentials in the clearml.conf is of a specific user (and not user named tests ` )
  
  
Posted one year ago

i am not sure i get you here.
when pip installing clearml-agent, it doesnt fire any agent. the procedure is that after having installed the package, if there isnt any config file, you do clearml-agent init and you enter the credentials, which are stored in clearml.conf. If there is a conf file, you simply edit it and manually enter the credentials. so i dont understand what you mean by "remove it"

  
  
Posted one year ago

OutrageousSheep60 it looks to me this agent is part of the server's deployment

  
  
Posted one year ago

agree -
we understand now that the worker is the default worker that is installed after running
pip install clearml-agentis it possible to remove it ? since all tasks that use the worker don't have the correct credentials.

  
  
Posted one year ago

Here is the screenshot - we deleted all the workers - accept for the one that we couldn't

  
  
Posted one year ago

Hi SweetBadger76 -
I'm I misunderstanding how this tests worker runs?

  
  
Posted one year ago

btw can you screenshot your clearml-agent list and UI please ?

  
  
Posted one year ago

how are you deploying your server?

  
  
Posted one year ago

Sorry - I'm a Helm newbee
when running
helm search repo clearml --versionsI can't see version 3.6.2 - the highest is 3.5.0
This is the repo that we used to get the helm chart
helm repo add allegroaiWhat I'm I missing?

  
  
Posted one year ago

Sorry -
After updating the repo I can see that the newest chart is 4.1.1
SweetBadger76 should I update to this version?

  
  
Posted one year ago

can you try again after having upgraded to 3.6.2 ?

  
  
Posted one year ago

latest version? only the clearml chart?

  
  
Posted one year ago

clearml-3.5.0

  
  
Posted one year ago

JuicyFox94

  
  
Posted one year ago

not sure i understand
we are running the daemon in a detached mode

clearml-agent daemon --queue <execution_queue_to_pull_from> --detached

  
  
Posted one year ago

Question - if we change the

clearml.conf

do we need to stop and start the daemon?

yes

  
  
Posted one year ago
595 Views
30 Answers
one year ago
one year ago
Tags
Similar posts