I think the issue is the pod to pod comms can't resolve my route53 dns records
I verified in the pod yaml it is set correctly
I used the values from the dashboard/configuration/api keys
BoredHedgehog47 can you provide some logs, this is odd..
so its not the files server, its every server
yep that fixed it using references like clearml-webserver.clearml.svc.cluster.local:80
Then it tries to curl the files API and gets a 405
I think if I use the local service URL this problem is fixed
` * Serving Flask app 'fileserver' (lazy loading)
- Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead. - Debug mode: off
[2022-09-08 13:24:25,822] [8] [WARNING] [werkzeug] * Running on all addresses.
WARNING: This is a development server. Do not use it in a production deployment. `
I can see this log message in the nginx controller"GET / HTTP/1.1" 405 178 "-" "curl/7.79.1" 95 0.003 [clearml-clearml-fileserver-8081] [] 10.36.1.61:8081 178 0.004 405 b4f5caf7665ffa1e8823a195ae41ec26
that is the containerinit logs from k8glueagent
I just opened a shell with the api and tried to curl my files URL, and the curl just hangs. no response
I don't see any requests
This points to configuration, specifically maybe it is directed to a different server?!
These are the logs from the fileserver pod
curl --insecure -sw %{http_code}
-o /dev/null │ │ init-k8s-glue waiting for apiserver │ │ init-k8s-glue + [ 000 -ne 200 ]