Hi SarcasticSparrow10 , I am trying to understand whether we have some gaps in the instructions. In the upgrade process did you perform the steps 3-10 of the below instruction? Were there any errors when performing these steps?
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration
Are you running your dockers on Linux or Windows?
SubstantialBaldeagle49 This should collect the logs: 'sudo docker logs trains-apiserver >& apiserver.logs'
Hi ResponsiveCamel97 , the shards and indices stats look fine. Can you please try the async delete of the task data? You can run the following line in the shell inside the apiserver container. Just replace <task_id> with your actual task idcurl -XPOST -H "Content-Type: application/json" "
" -d'{"query": {"term": {"task": "<task_id>"}}}'
You should get in response something like this:{"task":"p6350SG7STmQALxH-E3CLg:1426125"}
Then you can periodically ping ES on the status of the r...
It seems that index events-log-d1bd92a3b039400cbafc60a7a5b1e52b got corrupted. In case there are no backups the only choice would be to delete this index from elasticsearch
Actually the task logs will be lost. The tasks themselves and their reported metrics and plots would stay. The command is the following:curl -XDELETE localhost:9200/events-log-d1bd92a3b039400cbafc60a7a5b1e52b
The index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b status is red. Meaning that the data for this index got corrupted. Since there are no replicas the only feasible option would be to delete this index. All the training scalars events for the old taskd would be lost then. But the newly created tasks should start working fine.curl -XDELETE
According to the sizes the data is there and ES sees it.
Yes exactly, can you please verify that you use /home/orpat/trains/data/elastic_7 in the docker compose of 1.5?
Are you sure that it was performed fully according to the suggested sequence? The error that you posted says that v3.6 data is incompatible with v4.4 and suggests version 4.2 or earlier. Step 3 starts with mongo 4.0 that should be able to open v3.6 data. And then a number of gradual updates through versions 4.0->4.2->4.4 is performed
@<1523701868901961728:profile|ReassuredTiger98> Strange:( in 1.10 we already had the code for clearing ES scrolls created during the task deletion. I would recommend upgrading to the latest release v1.12.1 anyway. In addition you can instruct ES to allow more open scrolls like below. By default it is limited to 500.
Hi QuaintJellyfish58 , thanks for the feedback. I am trying to compare what you send and receive for team's view with what you get in My-work view. Can you please also send the data for the same requests and responses in the My work view structured in the same way like you sent for the team view now?
curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{"persistent" : {"search.max_open_scroll_context": 1000}}'
Hi @<1585078752969232384:profile|FantasticDuck7> , there is an apiserver configuration file apiserver->config->default->services->storage_credentials.conf
It contains the parameters for accessing files on the external storages like s3, google or azure. Please provide the same minio server access parameters as you do for the SDK configuration.
The actual deletion is performed by the async_delete service. You can inspect its logs with "sudo docker logs async_delete" command. Before configuring...
@<1585078752969232384:profile|FantasticDuck7> What volume mappings do you have for the async_delete service in the docker-compose.yaml file?
Ok, I see. Then you can enter the apiserver container:sudo docker exec -it clearml-apiserver /bin/bash
And run the following commands inside the containercurl -XGET
curl -XGET
Hi JitteryCoyote63 , you mentioned that download task logs brings all the events. It would be interesting to compare the events that are in the download log but not in the task log screen with those that are returned in the screen too. Can you please share the download task logs file and the request and response that you get from the events.get_task_log for the same task?
Hi QuaintJellyfish58 , I am investigating the issue. Can you please also send the request and response from projects.get_all when you are in the Team's Work view (the case where there is no undefined project)?
With what memory setting do you run ES? How much memory and cpu is currently occupied by ES container?
Hi RattyFish27 , it seems that there is some issue with Elasticsearch cluster. Can you please run the following commands on the server and paste here their output?curl -XGET
curl -XGET
It seems that elasticsearch is failing on any search request. Can you please run the following commands and share the results?curl -XGET
curl -XGET
Hi QuaintJellyfish58 , it seems that we identified the problem. The undefined project that you see is not a real project. It is a placeholder where the statistics of ex-1 project should be shown. We found a bug in the apiserver that under these particular conditions fails to return the ex-1 project data so the placeholder remains empty (undefined). If I understand correctly it should only cause the inconvenience but not influence your workflow. Is it correct? We are fixing the issue in the n...
Do you see any error in the browser network tab?
I mean it is not possible to open v3.6 data in version 4.4. That's why the steps 3-10 are there
HiĀ ImmenseMole52 , did you do any changes in the docker compose file? If yes, then can you please send your version of the file?
Are you running them on the computer that hosts the server docker containers. What is the port binding for elasticsearch in your docker compose?
Hi Elior, chances are that you do not have enough space for Elasticsearch on your storage. Please check the ES logs and increase the available disk space.
As long as you delete only from the deleted tasks folders it should be OK
Hi @<1558986867771183104:profile|ShakyKangaroo32> , can you please share the logs from the async_delete docker container?