As long as you delete only from the deleted tasks folders it should be OK
Hi UnevenDolphin73 . how many artifacts do you have on this task? We are storing task metadata in Mongo and there is a limit of 16Mb per a single document. While the artifact itself is not stored under the task there is some metadata (notably the uri and display_data/preview) that is stored for each artifact
Hi @<1686547380465307648:profile|StrongSeaturtle89> , please put the following setting in the docker-compose.yaml under elasticsearch->environment:
ingest.geoip.downloader.enabled: false
And then restart the docker compose. Does it help?
The 1.10 version handles files deletion differently so there is chance that it fixes the issue. If you use the default apiserver port then I would try upgrading. If you override the apiserver port then please wait for the hotfix version 1.10.1 that should be released soon
Hi RotundSquirrel78 , can you please check that your docker compose file has the correct volume mapping for elasticsearch service? From the output of the upgrade script I assume it should be from /home/orpat/trains/data/elastic_7 into /usr/share/elasticsearch/data
Ok, I see. And if you run a new experiment in the new version do you see its logs?
The index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b status is red. Meaning that the data for this index got corrupted. Since there are no replicas the only feasible option would be to delete this index. All the training scalars events for the old taskd would be lost then. But the newly created tasks should start working fine.curl -XDELETE
According to the sizes the data is there and ES sees it.
Hi MassiveHippopotamus56
Can you please open the browser developer tools, navigate to scalar tabs for one of the experiments that show wrong iteration and copy here the request payload and response for the events.scala_metrics_iter_histogram call?
Please run these commands and see if you have any "red" statuses in the output:
curl " http://localhost:9200/_cluster/health?pretty "
curl " http://localhost:9200/_cluster/health?level=indices&pretty "
Ok, I see. Then you can enter the apiserver container:sudo docker exec -it clearml-apiserver /bin/bash
And run the following commands inside the containercurl -XGET
curl -XGET
Hi @<1523707653782507520:profile|MelancholyElk85> , what version of the apiserver are you using?
We found the issue. It will be fixed in the upcoming patch for the open-v1.14 release
Hi VexedPeacock35 , I suspect that Elasticsearch works too hard and periodically misses timeouts on recording events. How much memory and CPU is it using? Can you increase the memory that is allocated to it and see whether this helps?
If you run the following command 'sudo chown -R 1000:1000 /opt/trains' does it change anything?
Hi @<1523701868901961728:profile|ReassuredTiger98> , what version of the apiserver are you using?
Hi QuaintJellyfish58 in the latest data that you sent I see only the responses (some of them are marked as payloads but they are actually responses). What would be very interesting is to see the requests (payloads) that resulted in the following empty responses:
` # response
{"meta":{"id":"aaaffe49ace64f1a8b0211925afcfd32","trx":"aaaffe49ace64f1a8b0211925afcfd32","endpoint":{"name":"projects.get_all_ex","requested_version":"2.20","actual_version":"1.0"},"result_code":200,"result_subcode":0,...
Thanks, I think that I see the problem,
Hi CooperativeFox72 , how much free space do you have on your disk now? If you run du on your /opt/trains/data/elastic_7 folder in let's say 5 mins intervals do you see the folder size is growing?
Hi QuaintJellyfish58 , thanks for the feedback. I am trying to compare what you send and receive for team's view with what you get in My-work view. Can you please also send the data for the same requests and responses in the My work view structured in the same way like you sent for the team view now?
Hi @<1558986867771183104:profile|ShakyKangaroo32> , can you please share the logs from the async_delete docker container?
@<1523701868901961728:profile|ReassuredTiger98> Strange:( in 1.10 we already had the code for clearing ES scrolls created during the task deletion. I would recommend upgrading to the latest release v1.12.1 anyway. In addition you can instruct ES to allow more open scrolls like below. By default it is limited to 500.
No, there was a problem with the particular version migration. The temporary index creation allowed to this and all subsequent migrations to run successfully. So for now your DB is properly aligned with the latest ClearML and the future upgrades should work fine.
The index "events-plot-d1bd92a3b039400cbafc60a7a5b1e52b" is red meaning that it is corrupted and elastic cannot work with it. The most straightforward solution would be to delete this index but it will result in all the plots generated so far will be lost.
Hi Elior, chances are that you do not have enough space for Elasticsearch on your storage. Please check the ES logs and increase the available disk space.
Hi ImmenseMole52 , did you do any changes in the docker compose file? If yes, then can you please send your version of the file?
Can you try deleting the application cookie? While being on the trains page in the browser devtools you navigate to Application->Cookies and under it delete any trains cookies that are there. I believe that you will need to login after that
Yes exactly, can you please verify that you use /home/orpat/trains/data/elastic_7 in the docker compose of 1.5?