Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello All! New To Clearml And Has Been An Amazing Experience Using It So Far! I Have Been Trying To Add One Of My In-House Ubuntu Servers As A Node Using Clearml-Agent. I Set-Up My Creds Using The

Hello All!
New to ClearML and has been an amazing experience using it so far!

I have been trying to add one of my in-house ubuntu servers as a node using clearml-agent. I set-up my creds using the clearml-agent init and giving it the creds, it also verifies the creds and works well.

When I start the agent using: clearml-agent daemon --queue <queue name> --detached --cpu-only --docker --force-current-version everything goes smoothly.

The problem occurs when I try to Queue a task to this node. The task starts by pulling in a docker image (Default NVIDIA docker) and while it is building it fails with:

cp: -r not specified; omitting directory '/tmp/clearml.conf'
Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at 
 and run `clearml-agent init`

I do have the file ~/clearml.conf at the exact location it is trying to find it. I am not sure why it errors out even after that. Any idea where I maybe going wrong?

  
  
Posted one year ago
Votes Newest

Answers 11


That is correct (And I am not sure why is it doing that).

However, immediately after those lines, it tries to find the default conf file:

Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing `~/clearml.conf` or Environment CLEARML_API_HOST)

Which is present at /home/<username>/clearml.conf or ~/clearml.conf on Linux. I even tried to define the CLEARML_API_HOST env variable. But fails at the same error.

  
  
Posted one year ago

@<1631102000244461568:profile|DespicableHippopotamus75> , this line from the log:

cp: -r not specified; omitting directory '/tmp/clearml.conf'

Basically implies that this script line (which is part of the setup) failed due to /tmp/clearml.conf being a directory and not a file:

cp /tmp/clearml.conf ~/default_clearml.conf

Since this is a volume mount mounting a file (as part of the docker run command started by the agent chhedaserver:cpu:0):

-v /tmp/.clearml_agent.r2ua8u1y.cfg:/tmp/clearml.conf

I can only assume there's some issue with the /tmp/.clearml_agent.r2ua8u1y.cfg file generated by the agent prior to mounting the file - docker mounting a file as a directory usually means the file is no longer there - might it been deleted for some reason?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> sure seems to be a problem with the set-up somewhere.

I tried on a pristine GPU machine, it worked well there

  
  
Posted one year ago

Hi @<1631102000244461568:profile|DespicableHippopotamus75> , can you share the task's log?

  
  
Posted one year ago

For the most part the logs contains the dump of docker pull statements and python installations. And then it fails because it is not able to establish connection with the clearML server.

@<1523701087100473344:profile|SuccessfulKoala55>

  
  
Posted one year ago

When my Queue name is CPU_Queue which is what I pass to the clearml-agent deoman call in --queue argument

  
  
Posted one year ago

Sure, here you go:

  
  
Posted one year ago

Thank you for all your help! 😄

  
  
Posted one year ago

Also, I found it weird when I tried to get the status of the agent using clearml-agent daemon --status

It says:

No uptime/downtime configurations found
 
  - Listening to queue 'default'
  
  
Posted one year ago

I am running on a similar issues, does anyone had a solution for this ?

  
  
Posted 11 months ago

As far as I can remember this is docker volume mount issue

  
  
Posted 11 months ago
974 Views
11 Answers
one year ago
11 months ago
Tags