We have deployed our own ClearML server in Azure. We have 2 separate address for the api and web server. Both serving at port 443
In the local PC config file we have something like:
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
credentials {"access_key": "REDACTED", "secret_key": "REDACTED"}
}
No issue accessing to the web server. No issue for Task creation and tracking.
The issue: when we train for some time, we start to see this kind of error:
2023-08-18 17:29:00,055 - clearml.metrics - WARNING - Failed uploading to
(HTTPSConnectionPool(host='clearml-web.REDACTED.azurecontainerapps.io', port=8081): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f75dc3e2f50>, 'Connection to clearml-web.REDACTED.azurecontainerapps.io timed out. (connect timeout=300.0)')))
Looks like it is trying to connect on port 8081 which is the wrong port ! Any chance that there is a bug where the port is hardcoded here ?
Using clearml==1.12.2