IdealPanda97 Ok, I see. Can you please run the following command, then restart the docker-compose and see if it makes any difference?sudo chown -R 1000:1000 /opt/trains
@<1523701066867150848:profile|JitteryCoyote63> The requirements list the client library that apiserver uses to access the Elasticsearch. This library is capable of working with both Elasticsearch 7 and 8
Oh, I see. Then maybe we can see some more info in the browser dev tools
Great! What error do you still see in UI when comparing more than 20 experiments? At the time of error do you see any error response from the apiserver (in the browser network tab)? When the call to compare of 20+ task metrics succeed how much time does it usually takes in your environment?
The data that you sent looks fine. It seems that you actually has these iterations in Elasticsearch. To check whether it is the case please run the following command in the shell on your host. You should get the first 10 task events with the smallest iterations:curl -XGET -H "Content-Type: application/json" localhost:9200/events-training_stats_scalar*/_search?pretty -d' { "query": { "term": {"task": "d45ecb5ad7084175bd83dd39777b10c5"} }, "sort": {"iter": "asc"} }'
Ok, so there is no mapping for the whole config folder or specific config file that you changed. That's why async_delete does not get your updated configuration. You can do one of the following: either add here mapping for the specific file like you did earlier or map the whole config folder like apiserver service does:
- /opt/clearml/config:/opt/clearml/config
The second way is probably more flexible
Hi @<1523701868901961728:profile|ReassuredTiger98> , how exactly do you override the values in storage_credentials file? Do you prepare a new docker image with the changed file or map this file from outside with the volume mapping in the docker compose or through the env variables? What is also important is that you do this override for the async_delete service. It is the service that actually uses the storage credentials. Not the apiserver itself
Hi ResponsiveCamel97 , the shards and indices stats look fine. Can you please try the async delete of the task data? You can run the following line in the shell inside the apiserver container. Just replace <task_id> with your actual task idcurl -XPOST -H "Content-Type: application/json" "
" -d'{"query": {"term": {"task": "<task_id>"}}}'
You should get in response something like this:{"task":"p6350SG7STmQALxH-E3CLg:1426125"}
Then you can periodically ping ES on the status of the r...
@<1585078752969232384:profile|FantasticDuck7> What volume mappings do you have for the async_delete service in the docker-compose.yaml file?
If you run the following command 'sudo chown -R 1000:1000 /opt/trains' does it change anything?
Are you running them on the computer that hosts the server docker containers. What is the port binding for elasticsearch in your docker compose?
@<1585078752969232384:profile|FantasticDuck7> The best would be to copy this file to the host, edit it and map this file into the container instead of the original one. The single file mapping in the docker-compose file should look like this:
volumes:
- type: bind
source: <the path to the config file on the host>
target: /opt/clearml/apiserver/config/default/services/storage_credentials.conf
You should do it for the async_delete service. Not for the apise...
Hi H4dr1en, there is a chance that the problem is that in parallel reindexing of data. You can try to replace parallel=max(docker_resources.cpus // 2, 1)
at line 190 with
parallel=1
I think you will need to remove the /opt/trains/data/elastic_7 folder before script restart
Hi @<1673863788857659392:profile|HomelyRabbit25> , yes it should include the support for async_delete service. Please provide the storage_credentials configuration to this service instead of the apiserver. For the details of whether the deletion works or it has any issues with the provided configuration please inspect the logs from the async_delete pod.
Can you please run the following in the command line of the hosting server and share the results?curl -XGET
Hi DilapidatedDucks58 , I am trying to reproduce the "Connection is full warning". Do you override any apiserver environment variables is docker compose? If yes then can you share your version of docker-compose? Do you provide a configuration file for gunicorn? Can you please share it?
Enjoy the new version:) Would still be interesting to see what caused ES7 to stop responding.
Hi QuaintJellyfish58 in the latest data that you sent I see only the responses (some of them are marked as payloads but they are actually responses). What would be very interesting is to see the requests (payloads) that resulted in the following empty responses:
` # response
{"meta":{"id":"aaaffe49ace64f1a8b0211925afcfd32","trx":"aaaffe49ace64f1a8b0211925afcfd32","endpoint":{"name":"projects.get_all_ex","requested_version":"2.20","actual_version":"1.0"},"result_code":200,"result_subcode":0,...
This one is indeed dynamic but can be set as follows: "plot_len":{"type":"long"}
Hi Elior, chances are that you do not have enough space for Elasticsearch on your storage. Please check the ES logs and increase the available disk space.
Hi JitteryCoyote63 , are you still missing a month of data in the event logs? If you do cat indices do you see the same amount of docs in the original and the new ones?
Hi @<1523707653782507520:profile|MelancholyElk85> , what version of the apiserver are you using?
Hi @<1523701457835003904:profile|AbruptHedgehog21> can you please share the logs from the async_delete service? It is responsible for the actual deletion of the data
Hi @<1523701260895653888:profile|QuaintJellyfish58> , we are in the final stages of preparing the hotfix version open-v1.14.1. It should be released this week
Hi SubstantialElk6 , another thing that can be checked is the health of the particular ES indices. Can you please run the below command in the clearml-elastic container and post the results here?curl -XGET
No, there was a problem with the particular version migration. The temporary index creation allowed to this and all subsequent migrations to run successfully. So for now your DB is properly aligned with the latest ClearML and the future upgrades should work fine.
curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{"persistent" : {"search.max_open_scroll_context": 1000}}'