Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, Guys! I Have A Problem. I Launched Clearml Server And Trying To Run A Worker On Another Machine. When I Run

Hey, guys! I have a problem. I launched ClearML Server and trying to run a worker on another machine. When I run clearml-agent init it can't verify credentials and create a clearml.conf file. However clearml-init command successfully accepted my creds and created clearml.conf file. Then when I try to run clearml-agent daemon -d it just sits there, no output and it doesn't appear in workers sections in web UI.

How do I make my worker run properly? Can I see daemon output logs somewhere?
Any help is much appreciated. Thank you!

  
  
Posted 2 years ago
Votes Newest

Answers 39


What version of clearml and clearml-agent are you using, what OS? Can you add the line you're running for the agent?

  
  
Posted 2 years ago

Hi BoredBat47 , use the --foreground tag to see the logs 🙂

  
  
Posted 2 years ago

BoredBat47 what did you provide in the docker-compose to the services agent?
Also, you said that clearml-init worked but clearml-agent init did not - did you run both from the same place?

  
  
Posted 2 years ago

SuccessfulKoala55
I managed to create clearml.conf file with clearml-agent init after fixing proxy problem. And now trying to run daemon with this conf file. I suspect something is missing from it since request validator fails with missing attribute

  
  
Posted 2 years ago

CostlyOstrich36 Yep, it seems it was the case. I did not provide credentials for API in docker compose. I did that but now agent-services just keeps restarting. I looked into containers logs and it seems to be a proxy error. Why this container is trying to connect somewhere?

  
  
Posted 2 years ago

CostlyOstrich36 Any thoughts?

  
  
Posted 2 years ago

The terminal hangs on the command

  
  
Posted 2 years ago

clearml-agent daemon --foreground

  
  
Posted 2 years ago

SuccessfulKoala55
When I run clearml-agent init I don't have a file prior to this. I tried running agent daemon with clearml.conf created by clearml-init but that doesn't work since it has no agent section, right? I know I can add it myself but I think clearml-agent init should function too

  
  
Posted 2 years ago

do you have this file in your home folder?

  
  
Posted 2 years ago

Actually the agent will use the default values for the agent section if you have a clearml.init file - what do you get if you run the agent like that?

  
  
Posted 2 years ago

Is clearml-init also has to connect to the ClearML server to successfully finish?

Yes, it verifies the credentials in the same way, and creates a clearml.conf file when done

  
  
Posted 2 years ago

CostlyOstrich36 Seems like on my server agent-services container is missing. It's not running. Could it be the issue?

  
  
Posted 2 years ago

It works like I mentioned before: the terminal jumps on a new line and sits there, no output after that, nothing is happening in the console. But if you go to UI you see that "Last used" is updating

  
  
Posted 2 years ago

Sorry, forgot to mention. I used the command with --foreground tag. It is the same. Terminal just sits at a new line, no logs, no worker in UI

  
  
Posted 2 years ago

I looked through agent-services logs and found new error I haven't seen before:
clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server http://<my_ip>:8008 ?

  
  
Posted 2 years ago

SuccessfulKoala55 I provided following env vars:
CLEARML_HOST_IP: "<my_ip>"
CLEARML_WEB_HOST: " http://<my_ip>:8080 "
CLEARML_API_HOST: " http://<my_ip>:8008 "
CLEARML_FILES_HOST: " http://<my_ip>:8081 "
CLEARML_API_ACCESS_KEY: <my_access_key>
CLEARML_API_SECRET_KEY: <my_secret_key>
also I changed IP in entrypoint from apiserver:8008 to <my_ip>:8008

Yes, I run both commands from the same place — dedicated user on my worker machine. Is clearml-init also has to connect to the ClearML server to successfully finish?

  
  
Posted 2 years ago

BoredBat47 the agent-services is probably not configured (it needs key and secret to the clearml server to be configured in the docker-compose)

  
  
Posted 2 years ago

But from what you're saying it seems like the agent simply cannot communicate with the server and what you see is simply the agent waiting indefinitely

  
  
Posted 2 years ago

What command did you use?

  
  
Posted 2 years ago

Console output of clearml-agent init with no clearml.conf:
...
ClearML Hosts configuration:
Web App: None
API: None
File Store: None

Verifying credentials ...
Error: could not verify credentials: key=ak secret=sk
...
Console output of clearml-agent daemon --foreground with clearml.conf created by clearml-init is missing. No output.
...

  
  
Posted 2 years ago

Console output of clearml-agent daemon --foreground ?

  
  
Posted 2 years ago

CostlyOstrich36
Should I leave as is or fill the values in docker-compose for agent-services? I set it to localhost since agent-services is running together with other clearml containers on one machine. Not sure why do you have to fill those values.
CLEARML_HOST_IP: "<my_clearml_server_ip>"
CLEARML_WEB_HOST: " None "
CLEARML_API_HOST: " None "
CLEARML_FILES_HOST: " None "

  
  
Posted 2 years ago

The strange thing also is that I see that the credentials are being used in web UI: last used timestamp is updated constantly to present time. So apparently daemon is trying to do something but can't launch properly all the way

  
  
Posted 2 years ago

Hi, sorry for the delay 😞

  
  
Posted 2 years ago

clearml 1.9.0
clearml-agent 1.5.1
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"

  
  
Posted 2 years ago

Sorry for bothering but I am really lost, I think I exhausted all my options. I really have no clue what is going on.

  
  
Posted 2 years ago

I think so, yes

  
  
Posted 2 years ago

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 21354 Warning: Transient problem: HTTP error Will retry in 10 seconds. 10 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 21345 Warning: Transient problem: HTTP error Will retry in 10 seconds. 9 retries Warning: left. 100 100k 100 100k 0 0 10238 0 0:00:10 0:00:10 --:--:-- 21345 Warning: Transient problem: HTTP error Will retry in 10 seconds. 8 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 7 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26958 Warning: Transient problem: HTTP error Will retry in 10 seconds. 6 retries Warning: left. 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 26951 Warning: Transient problem: HTTP error Will retry in 10 seconds. 5 retries Warning: left. 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 26958 Warning: Transient problem: HTTP error Will retry in 10 seconds. 4 retries Warning: left. 100 100k 100 100k 0 0 10235 0 0:00:10 0:00:10 --:--:-- 26951 Warning: Transient problem: HTTP error Will retry in 10 seconds. 3 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 2 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 1 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead:

  
  
Posted 2 years ago

Also services agent is not related to regular agent executions

  
  
Posted 2 years ago
81K Views
39 Answers
2 years ago
2 years ago
Tags
Similar posts