Also, I think getting the logs for the restarting docker container will help
Hi @<1523701087100473344:profile|SuccessfulKoala55> , seems to be related to some permission setting as I am getting permission denied errors even though I ran the chown command as reported in the tutorial.
Here some snippets from the log and here is also the diff btw the original docker-compose and the modified one -> None
ELASTIC
/usr/share/elasticsearch/bin/elasticsearch-env: line 158: cannot create temp file for here-document: Permission denied
/usr/share/elasticsearch/bin/elasticsearch-env: line 158: cannot create temp file for here-document: Permission denied
Exception in thread "main" java.nio.file.AccessDeniedException: /tmp/elasticsearch-17994849695337179062
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397)
at java.base/java.nio.file.Files.createDirectory(Files.java:700)
at java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:134)
at java.base/java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:171)
at java.base/java.nio.file.Files.createTempDirectory(Files.java:1017)
at org.elasticsearch.tools.launchers.Launchers.createTempDirectory(Launchers.java:55)
at org.elasticsearch.tools.launchers.TempDirectory.main(TempDirectory.java:43)
WEBSERVER
10.169.8.21 - - [15/Nov/2023:07:55:08 +0000] "POST /xmlrpc.php HTTP/1.1" 405 157 "-" "-"
2023/11/15 07:55:08 [error] 11#11: *44 connect() failed (111: Connection refused) while connecting to upstream, client: 10.169.8.21, server: _, request: "GET /api/getServices?name[]=$() HTTP/1.1", upstream: "
", host: "10.169.8.196:8080"
2023/11/15 07:55:08 [error] 11#11: *44 open() "/usr/share/nginx/html/50x.html" failed (2: No such file or directory), client: 10.169.8.21, server: _, request: "GET /api/getServices?name[]=$() HTTP/1.1", upstream: "
", host: "10.169.8.196:8080"
APISERVER
[2023-11-16 14:06:08,791] [9] [INFO] [clearml.redis_manager] Using override redis host redis
[2023-11-16 14:06:08,792] [9] [INFO] [clearml.redis_manager] Using override redis port 6379
[2023-11-16 14:06:08,804] [9] [INFO] [clearml.es_factory] Using override elastic host elasticsearch
[2023-11-16 14:06:08,806] [9] [INFO] [clearml.es_factory] Using override elastic port 9200
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 707, in connect
sock = self.retry.call_with_retry(
File "/usr/local/lib/python3.9/site-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 708, in <lambda>
lambda: self._connect(), lambda error: self.disconnect(error)
File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 974, in _connect
for res in socket.getaddrinfo(
File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/clearml/apiserver/server.py", line 6, in <module>
from apiserver.server_init.app_sequence import AppSequence
File "/opt/clearml/apiserver/server_init/app_sequence.py", line 9, in <module>
from apiserver.bll.queue.queue_metrics import MetricsRefresher
File "/opt/clearml/apiserver/bll/queue/__init__.py", line 1, in <module>
from .queue_bll import QueueBLL
File "/opt/clearml/apiserver/bll/queue/queue_bll.py", line 12, in <module>
from apiserver.bll.queue.queue_metrics import QueueMetrics
File "/opt/clearml/apiserver/bll/queue/queue_metrics.py", line 22, in <module>
redis = redman.connection("apiserver")
File "/opt/clearml/apiserver/redis_manager.py", line 80, in connection
obj.get("health")
File "/usr/local/lib/python3.9/site-packages/redis/commands/core.py", line 1816, in get
return self.execute_command("GET", name)
File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1266, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 1461, in get_connection
connection.connect()
File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 713, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error -3 connecting to redis:6379. Temporary failure in name resolution.
FILESERVER
[2023-11-15 07:55:23,884] [8] [ERROR] [werkzeug] 10.169.8.21 - - [15/Nov/2023 07:55:23] code 400, message Bad request syntax ('\x16\x03\x03\x01\x8f\x01\x00\x01\x8b\x03\x03\x91\x15(\x9dLÃ,½lv·¦Ü<Y¬ÉÄ')
[2023-11-15 07:55:24,474] [8] [ERROR] [werkzeug] 10.169.8.21 - - [15/Nov/2023 07:55:24] code 400, message Bad request version ("X£¢m[L¬ÚW¹\x81\x95üåc½MoÍ´\x9fF\x00f\x13\x02\x13\x01À,À+À0À/\x00\x9f\x00£\x00\x9e\x00¢À$À(À#À'\x00k\x00j\x00g\x00@À.À2À-À1À&À*À%À)À")
MONGO
{"t":{"$date":"2023-11-16T14:08:10.267+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"20.04"}}}
{"t":{"$date":"2023-11-16T14:08:10.267+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"},"setParameter":{"internalQueryMaxBlockingSortMemoryUsageBytes":"196100200"}}}}
{"t":{"$date":"2023-11-16T14:08:10.267+00:00"},"s":"E", "c":"STORAGE", "id":20568, "ctx":"initandlisten","msg":"Error setting up listener","attr":{"error":{"code":9001,"codeName":"SocketException","errmsg":"Permission denied"}}}
{"t":{"$date":"2023-11-16T14:08:10.267+00:00"},"s":"I", "c":"REPL", "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"COMMAND", "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"NETWORK", "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"NETWORK", "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"CONTROL", "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"STORAGE", "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"STORAGE", "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"-", "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"FTDC", "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"CONTROL", "id":20565, "ctx":"initandlisten","msg":"Now exiting"}
{"t":{"$date":"2023-11-16T14:08:10.268+00:00"},"s":"I", "c":"CONTROL", "id":23138, "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":48}}
{"t":{"$date":"2023-11-16T14:09:10.717+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2023-11-16T14:09:10.719+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2023-11-16T14:09:10.720+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2023-11-16T14:09:10.720+00:00"},"s":"I", "c":"STORAGE", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"54deb300813e"}}
{"t":{"$date":"2023-11-16T14:09:10.720+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.9","gitVersion":"b4048e19814bfebac717cf5a880076aa69aba481","openSSLVersion":"OpenSSL 1.1.1f 31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2004","distarch":"x86_64","target_arch":"x86_64"}}}}
REDIS
1:M 16 Nov 2023 14:09:19.017 * Running mode=standalone, port=6379.
1:M 16 Nov 2023 14:09:19.017 # Server initialized
1:M 16 Nov 2023 14:09:19.017 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 16 Nov 2023 14:09:19.017 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 16 Nov 2023 14:09:19.017 # Fatal error loading the DB: Permission denied. Exiting.
Hi @<1630014842674876416:profile|ObnoxiousBluewhale46> , which fields did you change exactly in the docker-compose?
Here's a docker-compose I've been playing with. It doesn't have the same restart problem you're describing, but I did change the volume mounts: None