SuccessfulKoala55 I was able to recreate the indices in the new ES cluster. I specified number_of_shards: 4
for the events-log-d1bd92a3b039400cbafc60a7a5b1e52b
index. I then copied the documents from the old ES using the _reindex
API. The index is 7.5Gb on one shard.
Now I see that this index on the new ES cluster is ~19.4Gb 🤔 The index is divided into the 4 shards, but each shard is between 4.7Gb and 5Gb!
I was expecting to have the same index size as in the previous env (7.5Gb) splitted across the 4 shards ( 7.5 / 4 = ~1,8). What am I missing?
SuccessfulKoala55
In the docker-compose file, you have an environment setting for the apiserver service host and port (CLEARML_ELASTIC_SERVICE_HOST and CLEARML_ELASTIC_SERVICE_PORT) - changing those will allow you to point the server to another ES service
The ES cluster is running in another machine, how can I set its IP in CLEARML_ELASTIC_SERVICE_HOST
? I would need to add host
to the networks of the apiserver service somehow? How can I do that?
Thanks! I would like to use this opportunity to split the indices into multiple shards, as explained here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html#indices-split-index
the api-server shows when starting:clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic host
clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic port 9200 ... clearml-apiserver | [2021-07-13 11:09:38,407] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec clearml-apiserver | [2021-07-13 11:10:08,414] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 2 of 4. Waiting for 30sec clearml-apiserver | [2021-07-13 11:10:38,443] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 3 of 4. Waiting for 30sec clearml-apiserver | [2021-07-13 11:11:08,468] [9] [ERROR] [clearml.app_sequence] Error connecting to Elasticsearch: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f4866d5f6a0>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f4866d5f6a0>: Failed to establish a new connection: [Errno -2] Name or service not known) clearml-apiserver | Traceback (most recent call last): clearml-apiserver | File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main clearml-apiserver | "__main__", mod_spec) clearml-apiserver | File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code clearml-apiserver | exec(code, run_globals) clearml-apiserver | File "/opt/clearml/apiserver/server.py", line 10, in <module> clearml-apiserver | AppSequence(app).start(request_handlers=RequestHandlers()) clearml-apiserver | File "/opt/clearml/apiserver/server_init/app_sequence.py", line 40, in start clearml-apiserver | self._init_dbs() clearml-apiserver | File "/opt/clearml/apiserver/server_init/app_sequence.py", line 90, in _init_dbs clearml-apiserver | and get_last_server_version() < Version("0.16.0") clearml-apiserver | TypeError: '<' not supported between instances of 'Version' and 'Version'
Hi JitteryCoyote63 ,
In the docker-compose file, you have an environment setting for the apiserver
service host and port ( CLEARML_ELASTIC_SERVICE_HOST
and CLEARML_ELASTIC_SERVICE_PORT
) - changing those will allow you to point the server to another ES service
It's an external server, so no need to add anything to the docker-compose network
ha wait, I removed the http://
in the host and it worked 🎉
Migration of the data is basicallt reindexing from a different server, I guess you can read about it here: https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html
You mean it will resolve by itself in the following days or should I do something? Or there is nothing to do and it will stay this way?
There's an ES REST Api command to get the current template
http vs https is controlled by the secure
flag, I think
The number of documents in the old and the new env are the same though 🤔 I really don’t understand where this extra space used comes from
ha nice, where can I find the mapping template of the original clearml so that I can copy and adapt?
This https://discuss.elastic.co/t/index-size-explodes-after-split/150692 seems to say for the _split API such situation happens and solves itself after a couple fo days, maybe the same case for me?
The host is accessible, I can ping it and even run curl "
http://internal-aws-host-name:9200/_cat/shards "
and get results from the local machine
If this host is accessible from the machine running ClearML server (for example http://some-host-name ), you can just use the hostname
Well, probably an optimization side-effect
It can be done ay the same time, it all depends on the mapping template you set ion the new cluster
PS: in the new env, I’v set num_replicas: 0, so I’m only talking about primary shards…
I am not sure I can do both operations at the same time (migration + splitting), do you think it’s better to do splitting first or migration first?