SuccessfulKoala55 grrrrr it keeps happening, I have no idea what's wrong
well, the server wouldn't work without them?
nope, old clenup task fails with trains_agent: ERROR: Could not find task id=e7725856e9a04271aab846d77d6f7d66 (for host: )
Exception: 'Tasks' object has no attribute 'id
weirdly enough, curl
http://apiserver:8008 from inside the container works
problem is solved. I had to replace /opt/trains/data/fileserver to /opt/clearml/data/fileserver in Agent configuration, and replace trains to clearml in Requirements
I assume you've configured the TRAINS_API_ACCESS_KEY
and TRAINS_API_SECRET_KEY
env vars?
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server
http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080
http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server
http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080
http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
Failed creating temporary copy of ~/.ssh for git credential
It looks like it has no host definition
Is it the same error again? Can you send the agent services logs?
Hi DilapidatedDucks58 ,
That's strange since it's an internal docker-compose address... Let me try to reproduce... Do you have any specific change in your setup?
can you share the entire output of the agent?
Hi DilapidatedDucks58 ,
Se it seems there's and issue where the agent-services can't resolve http://apiserver from within its container. Changing the default TRAINS_API_HOST
in the docker-compose to http://localhost:8008 does the trick - we'll update the docker-compose.
Also, it seems I forgot that by default, the agent does not require special credentials and can use the default built-in credentials that exists in the server 🤠- this is fine as long as you keep your server closed to the world (if you open it, you'll obviously want to change all default secrets and credentials anyway).
Yeah, this looks like it did finally succeed connecting to the apiserver...
nice, thanks! I'll check if it solves the issue first thing tomorrow in the morning
Just FYI, we've updated the agent service es docker image a week ago (along with the docker-compose.yml), you might want to try that out
Currently no, you need to set them to a set of credentials you created
Are you using the default docker-compose, or an AMI?
DilapidatedDucks58 I tested again, and I'm not sure there's a problem at all. I think what you saw might be a few errors thrown while the agent-services is waiting for the apiserver to start. I just tried a fresh install and the agent-services did appear in the Workers & Queues page. If you can, I'd like to see the full output of docker logs trains-agent-services
do you have any idea why cleanup task keeps failing then (it used to work before the update)
yeah, we did. let me check if explicitly setting credentials helps
Also, did you use the agent-services before upgrading? (i'm trying to understand if there's a regression of some sort)