Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, Guys! I Have A Problem. I Launched Clearml Server And Trying To Run A Worker On Another Machine. When I Run

Hey, guys! I have a problem. I launched ClearML Server and trying to run a worker on another machine. When I run clearml-agent init it can't verify credentials and create a clearml.conf file. However clearml-init command successfully accepted my creds and created clearml.conf file. Then when I try to run clearml-agent daemon -d it just sits there, no output and it doesn't appear in workers sections in web UI.

How do I make my worker run properly? Can I see daemon output logs somewhere?
Any help is much appreciated. Thank you!

  
  
Posted one year ago
Votes Newest

Answers 39


But from what you're saying it seems like the agent simply cannot communicate with the server and what you see is simply the agent waiting indefinitely

  
  
Posted one year ago

CostlyOstrich36 Any thoughts?

  
  
Posted one year ago

clearml-agent daemon --foreground

  
  
Posted one year ago

BoredBat47 , can you add the logs?

  
  
Posted one year ago

Can you try running clearml-agent --debug daemon --foreground ?

  
  
Posted one year ago

@<1523701070390366208:profile|CostlyOstrich36>
Should I leave as is or fill the values in docker-compose for agent-services? I set it to localhost since agent-services is running together with other clearml containers on one machine. Not sure why do you have to fill those values.
CLEARML_HOST_IP: "<my_clearml_server_ip>"
CLEARML_WEB_HOST: " None "
CLEARML_API_HOST: " None "
CLEARML_FILES_HOST: " None "

  
  
Posted one year ago

Actually the agent will use the default values for the agent section if you have a clearml.init file - what do you get if you run the agent like that?

  
  
Posted one year ago

What command did you use?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55>
I managed to create clearml.conf file with clearml-agent init after fixing proxy problem. And now trying to run daemon with this conf file. I suspect something is missing from it since request validator fails with missing attribute

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> I provided following env vars:
CLEARML_HOST_IP: "<my_ip>"
CLEARML_WEB_HOST: " http://<my_ip>:8080 "
CLEARML_API_HOST: " http://<my_ip>:8008 "
CLEARML_FILES_HOST: " http://<my_ip>:8081 "
CLEARML_API_ACCESS_KEY: <my_access_key>
CLEARML_API_SECRET_KEY: <my_secret_key>
also I changed IP in entrypoint from apiserver:8008 to <my_ip>:8008

Yes, I run both commands from the same place — dedicated user on my worker machine. Is clearml-init also has to connect to the ClearML server to successfully finish?

  
  
Posted one year ago

Sorry for bothering but I am really lost, I think I exhausted all my options. I really have no clue what is going on.

  
  
Posted one year ago

Console output of clearml-agent init with no clearml.conf:
...
ClearML Hosts configuration:
Web App: None
API: None
File Store: None

Verifying credentials ...
Error: could not verify credentials: key=ak secret=sk
...
Console output of clearml-agent daemon --foreground with clearml.conf created by clearml-init is missing. No output.
...

  
  
Posted one year ago

I looked through agent-services logs and found new error I haven't seen before:
clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server http://<my_ip>:8008 ?

  
  
Posted one year ago

Also, previous problem was in incorrect proxy configuration on agent machine

  
  
Posted one year ago

The terminal hangs on the command

  
  
Posted one year ago

Console output of clearml-agent daemon --foreground ?

  
  
Posted one year ago

@<1526734383564722176:profile|BoredBat47> the agent-services is probably not configured (it needs key and secret to the clearml server to be configured in the docker-compose)

  
  
Posted one year ago

BoredBat47 what did you provide in the docker-compose to the services agent?
Also, you said that clearml-init worked but clearml-agent init did not - did you run both from the same place?

  
  
Posted one year ago

The strange thing also is that I see that the credentials are being used in web UI: last used timestamp is updated constantly to present time. So apparently daemon is trying to do something but can't launch properly all the way

  
  
Posted one year ago

Hi BoredBat47 , use the --foreground tag to see the logs 🙂

  
  
Posted one year ago

~/.local/bin/clearml-agent daemon --foreground

  
  
Posted one year ago

but without -d

  
  
Posted one year ago

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 21354 Warning: Transient problem: HTTP error Will retry in 10 seconds. 10 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 21345 Warning: Transient problem: HTTP error Will retry in 10 seconds. 9 retries Warning: left. 100 100k 100 100k 0 0 10238 0 0:00:10 0:00:10 --:--:-- 21345 Warning: Transient problem: HTTP error Will retry in 10 seconds. 8 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 7 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26958 Warning: Transient problem: HTTP error Will retry in 10 seconds. 6 retries Warning: left. 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 26951 Warning: Transient problem: HTTP error Will retry in 10 seconds. 5 retries Warning: left. 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 26958 Warning: Transient problem: HTTP error Will retry in 10 seconds. 4 retries Warning: left. 100 100k 100 100k 0 0 10235 0 0:00:10 0:00:10 --:--:-- 26951 Warning: Transient problem: HTTP error Will retry in 10 seconds. 3 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 2 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 1 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead:

  
  
Posted one year ago

Also services agent is not related to regular agent executions

  
  
Posted one year ago

CostlyOstrich36 Seems like on my server agent-services container is missing. It's not running. Could it be the issue?

  
  
Posted one year ago

I think so, yes

  
  
Posted one year ago

Can you please attached the console output again?

  
  
Posted one year ago

Hi, sorry for the delay 😞

  
  
Posted one year ago

clearml 1.9.0
clearml-agent 1.5.1
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55>
When I run clearml-agent init I don't have a file prior to this. I tried running agent daemon with clearml.conf created by clearml-init but that doesn't work since it has no agent section, right? I know I can add it myself but I think clearml-agent init should function too

  
  
Posted one year ago
20K Views
39 Answers
one year ago
one year ago
Tags
Similar posts