CluelessElephant89 , Hi!
It looks like there is a problem with the API server. Can you please look for the docker logs and see what errors that it prints and paste here 🙂
CluelessElephant89 , the relevant command should be something of the sort sudo docker logs clearml-apiserver
hi CostlyOstrich36 thanks for responding. So running that command I get
[2021-10-19 20:53:52,726] [9] [INFO] [clearml.app_sequence] ################ API Server initializing ##################### [2021-10-19 20:53:52,727] [9] [INFO] [clearml.database] Initializing database connections [2021-10-19 20:53:52,727] [9] [INFO] [clearml.database] Using override mongodb host mongo [2021-10-19 20:53:52,728] [9] [INFO] [clearml.database] Using override mongodb port 27017 [2021-10-19 20:53:52,729] [9] [INFO] [clearml.database] Registering connection to auth-db (
) [2021-10-19 20:53:52,731] [9] [INFO] [clearml.database] Registering connection to backend-db (
) [2021-10-19 20:53:52,736] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec [2021-10-19 20:54:22,762] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 2 of 4. Waiting for 30sec [2021-10-19 20:54:52,771] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 3 of 4. Waiting for 30sec [2021-10-19 20:55:22,782] [9] [ERROR] [clearml.app_sequence] Error connecting to Elasticsearch: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7fb66477f978>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fb66477f978>: Failed to establish a new connection: [Errno -2] Name or service not known) Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/clearml/apiserver/server.py", line 10, in <module> AppSequence(app).start(request_handlers=RequestHandlers()) File "/opt/clearml/apiserver/server_init/app_sequence.py", line 40, in start self._init_dbs() File "/opt/clearml/apiserver/server_init/app_sequence.py", line 97, in _init_dbs "Error starting server: failed connecting to ElasticSearch service" Exception: Error starting server: failed connecting to ElasticSearch service Loading config from /opt/clearml/apiserver/config/default Loading config from file /opt/clearml/apiserver/config/default/apiserver.conf Loading config from file /opt/clearml/apiserver/config/default/hosts.conf Loading config from file /opt/clearml/apiserver/config/default/logging.conf Loading config from file /opt/clearml/apiserver/config/default/secure.conf Loading config from file /opt/clearml/apiserver/config/default/services/projects.conf Loading config from file /opt/clearml/apiserver/config/default/services/organization.conf Loading config from file /opt/clearml/apiserver/config/default/services/tasks.conf Loading config from file /opt/clearml/apiserver/config/default/services/events.conf Loading config from file /opt/clearml/apiserver/config/default/services/auth.conf Loading config from /opt/clearml/config [2021-10-19 20:55:25,367] [9] [INFO] [clearml.es_factory] Using override elastic host elasticsearch [2021-10-19 20:55:25,368] [9] [INFO] [clearml.es_factory] Using override elastic port 9200 [2021-10-19 20:55:25,636] [9] [INFO] [clearml.redis_manager] Using override redis host redis [2021-10-19 20:55:25,637] [9] [INFO] [clearml.redis_manager] Using override redis port 6379 [2021-10-19 20:55:25,740] [9] [INFO] [clearml.schema_reader] loading schema from cache [2021-10-19 20:55:25,832] [9] [INFO] [clearml.app_sequence] ################ API Server initializing ##################### [2021-10-19 20:55:25,833] [9] [INFO] [clearml.database] Initializing database connections [2021-10-19 20:55:25,833] [9] [INFO] [clearml.database] Using override mongodb host mongo [2021-10-19 20:55:25,834] [9] [INFO] [clearml.database] Using override mongodb port 27017 [2021-10-19 20:55:25,835] [9] [INFO] [clearml.database] Registering connection to auth-db (
) [2021-10-19 20:55:25,837] [9] [INFO] [clearml.database] Registering connection to backend-db (
) [2021-10-19 20:55:25,845] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec
Looks like the ElasticSearch service is down?
CluelessElephant89 try the elastic search logs clearml-elastic
CluelessElephant89 , I'd wager you might have missed one of the steps in the installation, probably permissions issue, I hope 🙂
CostlyOstrich36 uh oh... I think i need more memory...
`
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 2060255232 bytes for committing reserved memory.
An error report file with more information is saved as:
logs/hs_err_pid59.log
error:
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000085330000, 2060255232, 0) failed; error='Not enough space' (errno=12)
at org.elasticsearch.tools.launchers.JvmErgonomics.flagsFinal(JvmErgonomics.java:123)
at org.elasticsearch.tools.launchers.JvmErgonomics.finalJvmOptions(JvmErgonomics.java:88)
at org.elasticsearch.tools.launchers.JvmErgonomics.choose(JvmErgonomics.java:59)
at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:95) `
CluelessElephant89 , did you run the vm.max_map_count
command for elastic? Also what amount of RAM memory do you have on the machine you're running on?
CostlyOstrich36 O geez, you're going to laugh, but Im using a ec2 free tier and it only gives me 1 GiB of memory
I believe I ran that vm command already
oh wait, I don't see the 99-clearml.conf yet... let me try that before I kill this instance
CluelessElephant89 , I think the RAM requirements for elastic might be 2GB, you can try the following hack so it maybe will work.
In the machine that it's running on there should be a docker-compose.yml
file (I'm guessing at home directory).
For the following https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml#L41 you can try changing it to ES_JAVA_OPTS: -Xms1g -Xmx1g
and this might limit the elastic memory to 1 gb, however please note this might not work.
After the change please lower and raise the dockers again with the docker compose command 🙂
CluelessElephant89 actually having some errors on startup with ES is perfectly normal - it takes some time for ES to boot
I would first try curl
http://localhost:8008 from the server console (i.e. ssh)
O geez, you're going to laugh, but Im using a ec2 free tier and it only gives me 1 GiB of memory
Well CostlyOstrich36 is also right 🙂 - I'm not sure the server will be able to handle running with only 1GB
but you can try reducing ES to ES_JAVA_OPTS: -Xms500mb -Xmx500mb
?
In any case I don't think that would be a reasonable server setup
Okay thanks CostlyOstrich36 and SuccessfulKoala55 I'll beef up my server first and then run this again.
CostlyOstrich36 SuccessfulKoala55 super late update, but it turns out I needed to beef up the machine. Thanks for all the help!