Just FYI, we've updated the agent service es docker image a week ago (along with the docker-compose.yml), you might want to try that out
Is it the same error again? Can you send the agent services logs?
SuccessfulKoala55 grrrrr it keeps happening, I have no idea what's wrong
problem is solved. I had to replace /opt/trains/data/fileserver to /opt/clearml/data/fileserver in Agent configuration, and replace trains to clearml in Requirements
do you have any idea why cleanup task keeps failing then (it used to work before the update)
Yeah, this looks like it did finally succeed connecting to the apiserver...
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server
http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080
http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server
http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080
http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
Failed creating temporary copy of ~/.ssh for git credential
DilapidatedDucks58 I tested again, and I'm not sure there's a problem at all. I think what you saw might be a few errors thrown while the agent-services is waiting for the apiserver to start. I just tried a fresh install and the agent-services did appear in the Workers & Queues page. If you can, I'd like to see the full output of docker logs trains-agent-services
nice, thanks! I'll check if it solves the issue first thing tomorrow in the morning
Hi DilapidatedDucks58 ,
Se it seems there's and issue where the agent-services can't resolve http://apiserver from within its container. Changing the default TRAINS_API_HOST
in the docker-compose to http://localhost:8008 does the trick - we'll update the docker-compose.
Also, it seems I forgot that by default, the agent does not require special credentials and can use the default built-in credentials that exists in the server 🤠- this is fine as long as you keep your server closed to the world (if you open it, you'll obviously want to change all default secrets and credentials anyway).
can you share the entire output of the agent?
It looks like it has no host definition
nope, old clenup task fails with trains_agent: ERROR: Could not find task id=e7725856e9a04271aab846d77d6f7d66 (for host: )
Exception: 'Tasks' object has no attribute 'id
weirdly enough, curl
http://apiserver:8008 from inside the container works
yeah, we did. let me check if explicitly setting credentials helps
Also, did you use the agent-services before upgrading? (i'm trying to understand if there's a regression of some sort)
Currently no, you need to set them to a set of credentials you created
well, the server wouldn't work without them?
I assume you've configured the TRAINS_API_ACCESS_KEY
and TRAINS_API_SECRET_KEY
env vars?
Are you using the default docker-compose, or an AMI?
Hi DilapidatedDucks58 ,
That's strange since it's an internal docker-compose address... Let me try to reproduce... Do you have any specific change in your setup?