Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, Guys! I Have A Problem. I Launched Clearml Server And Trying To Run A Worker On Another Machine. When I Run

Hey, guys! I have a problem. I launched ClearML Server and trying to run a worker on another machine. When I run clearml-agent init it can't verify credentials and create a clearml.conf file. However clearml-init command successfully accepted my creds and created clearml.conf file. Then when I try to run clearml-agent daemon -d it just sits there, no output and it doesn't appear in workers sections in web UI.

How do I make my worker run properly? Can I see daemon output logs somewhere?
Any help is much appreciated. Thank you!

  
  
Posted 8 months ago
Votes Newest

Answers 39


@<1523701087100473344:profile|SuccessfulKoala55>
When I run clearml-agent init I don't have a file prior to this. I tried running agent daemon with clearml.conf created by clearml-init but that doesn't work since it has no agent section, right? I know I can add it myself but I think clearml-agent init should function too

  
  
Posted 8 months ago

Also, previous problem was in incorrect proxy configuration on agent machine

  
  
Posted 8 months ago

@<1523701087100473344:profile|SuccessfulKoala55>
I managed to create clearml.conf file with clearml-agent init after fixing proxy problem. And now trying to run daemon with this conf file. I suspect something is missing from it since request validator fails with missing attribute

  
  
Posted 8 months ago

It works like I mentioned before: the terminal jumps on a new line and sits there, no output after that, nothing is happening in the console. But if you go to UI you see that "Last used" is updating

  
  
Posted 8 months ago

But from what you're saying it seems like the agent simply cannot communicate with the server and what you see is simply the agent waiting indefinitely

  
  
Posted 8 months ago

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 21354 Warning: Transient problem: HTTP error Will retry in 10 seconds. 10 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 21345 Warning: Transient problem: HTTP error Will retry in 10 seconds. 9 retries Warning: left. 100 100k 100 100k 0 0 10238 0 0:00:10 0:00:10 --:--:-- 21345 Warning: Transient problem: HTTP error Will retry in 10 seconds. 8 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 7 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26958 Warning: Transient problem: HTTP error Will retry in 10 seconds. 6 retries Warning: left. 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 26951 Warning: Transient problem: HTTP error Will retry in 10 seconds. 5 retries Warning: left. 100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 26958 Warning: Transient problem: HTTP error Will retry in 10 seconds. 4 retries Warning: left. 100 100k 100 100k 0 0 10235 0 0:00:10 0:00:10 --:--:-- 26951 Warning: Transient problem: HTTP error Will retry in 10 seconds. 3 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 2 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 Warning: Transient problem: HTTP error Will retry in 10 seconds. 1 retries Warning: left. 100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 26965 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead:

  
  
Posted 8 months ago

@<1523701087100473344:profile|SuccessfulKoala55>
So, I did it with debug and got this stacktrace error:
type_checker=validator.TYPE_CHECKER.redefine_many({
AttributeError: type object 'Draft4Validator' has no attribute 'TYPE_CHECKER'

  
  
Posted 8 months ago

@<1523701070390366208:profile|CostlyOstrich36>
What agent-services is doing on start up? Seems like something is preventing it from properly working. I already added a command to entrypoint to configure pip.conf since we have to use a trusted mirror to download python packages. Also I managed to connect local agent to ClearML server by using 127.0.0.1 host in credentials. Still no luck with remote agent

  
  
Posted 8 months ago

but without -d

  
  
Posted 8 months ago

Can you please attached the console output again?

  
  
Posted 8 months ago

BoredBat47 what did you provide in the docker-compose to the services agent?
Also, you said that clearml-init worked but clearml-agent init did not - did you run both from the same place?

  
  
Posted 8 months ago

Sorry for bothering but I am really lost, I think I exhausted all my options. I really have no clue what is going on.

  
  
Posted 8 months ago

clearml 1.9.0
clearml-agent 1.5.1
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"

  
  
Posted 8 months ago

@<1523701070390366208:profile|CostlyOstrich36>
Should I leave as is or fill the values in docker-compose for agent-services? I set it to localhost since agent-services is running together with other clearml containers on one machine. Not sure why do you have to fill those values.
CLEARML_HOST_IP: "<my_clearml_server_ip>"
CLEARML_WEB_HOST: " None "
CLEARML_API_HOST: " None "
CLEARML_FILES_HOST: " None "

  
  
Posted 8 months ago

Console output of clearml-agent daemon --foreground ?

  
  
Posted 8 months ago

Is clearml-init also has to connect to the ClearML server to successfully finish?

Yes, it verifies the credentials in the same way, and creates a clearml.conf file when done

  
  
Posted 8 months ago

Console output of clearml-agent init with no clearml.conf:
...
ClearML Hosts configuration:
Web App: None
API: None
File Store: None

Verifying credentials ...
Error: could not verify credentials: key=ak secret=sk
...
Console output of clearml-agent daemon --foreground with clearml.conf created by clearml-init is missing. No output.
...

  
  
Posted 8 months ago

@<1526734383564722176:profile|BoredBat47> the agent-services is probably not configured (it needs key and secret to the clearml server to be configured in the docker-compose)

  
  
Posted 8 months ago

Sorry, forgot to mention. I used the command with --foreground tag. It is the same. Terminal just sits at a new line, no logs, no worker in UI

  
  
Posted 8 months ago

Hi, sorry for the delay 😞

  
  
Posted 8 months ago

The strange thing also is that I see that the credentials are being used in web UI: last used timestamp is updated constantly to present time. So apparently daemon is trying to do something but can't launch properly all the way

  
  
Posted 8 months ago

What version of clearml and clearml-agent are you using, what OS? Can you add the line you're running for the agent?

  
  
Posted 8 months ago

clearml-agent daemon --foreground

  
  
Posted 8 months ago

I think so, yes

  
  
Posted 8 months ago

BoredBat47 , can you add the logs?

  
  
Posted 8 months ago

Can you try running clearml-agent --debug daemon --foreground ?

  
  
Posted 8 months ago

The terminal hangs on the command

  
  
Posted 8 months ago

@<1523701087100473344:profile|SuccessfulKoala55> I provided following env vars:
CLEARML_HOST_IP: "<my_ip>"
CLEARML_WEB_HOST: " http://<my_ip>:8080 "
CLEARML_API_HOST: " http://<my_ip>:8008 "
CLEARML_FILES_HOST: " http://<my_ip>:8081 "
CLEARML_API_ACCESS_KEY: <my_access_key>
CLEARML_API_SECRET_KEY: <my_secret_key>
also I changed IP in entrypoint from apiserver:8008 to <my_ip>:8008

Yes, I run both commands from the same place — dedicated user on my worker machine. Is clearml-init also has to connect to the ClearML server to successfully finish?

  
  
Posted 8 months ago

CostlyOstrich36 Any thoughts?

  
  
Posted 8 months ago

CostlyOstrich36 Am I right I should also provide this URLS in agent-services section in docker-compose file?
CLEARML_HOST_IP: ${CLEARML_HOST_IP:-}
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
CLEARML_API_HOST: http://apiserver:8008

  
  
Posted 8 months ago
5K Views
39 Answers
8 months ago
8 months ago
Tags
Similar posts