Also, I found it weird when I tried to get the status of the agent using clearml-agent daemon --status
It says:
No uptime/downtime configurations found
- Listening to queue 'default'
For the most part the logs contains the dump of docker pull statements and python installations. And then it fails because it is not able to establish connection with the clearML server.
@<1523701087100473344:profile|SuccessfulKoala55>
When my Queue name is CPU_Queue
which is what I pass to the clearml-agent deoman call in --queue
argument
As far as I can remember this is docker volume mount issue
Hi @<1631102000244461568:profile|DespicableHippopotamus75> , can you share the task's log?
Thank you for all your help! 😄
That is correct (And I am not sure why is it doing that).
However, immediately after those lines, it tries to find the default conf file:
Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
Which is present at /home/<username>/clearml.conf or ~/clearml.conf
on Linux. I even tried to define the CLEARML_API_HOST env variable. But fails at the same error.
@<1523701087100473344:profile|SuccessfulKoala55> sure seems to be a problem with the set-up somewhere.
I tried on a pristine GPU machine, it worked well there
@<1631102000244461568:profile|DespicableHippopotamus75> , this line from the log:
cp: -r not specified; omitting directory '/tmp/clearml.conf'
Basically implies that this script line (which is part of the setup) failed due to /tmp/clearml.conf
being a directory and not a file:
cp /tmp/clearml.conf ~/default_clearml.conf
Since this is a volume mount mounting a file (as part of the docker run command started by the agent chhedaserver:cpu:0):
-v /tmp/.clearml_agent.r2ua8u1y.cfg:/tmp/clearml.conf
I can only assume there's some issue with the /tmp/.clearml_agent.r2ua8u1y.cfg
file generated by the agent prior to mounting the file - docker mounting a file as a directory usually means the file is no longer there - might it been deleted for some reason?
I am running on a similar issues, does anyone had a solution for this ?