AppetizingMouse58

0 Questions, 132 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Answers 132

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

I am not sure about the reasons. What you can do is to backup your /opt/trains/data folder periodically (preferably stopping the docker compose before it). Another possibility is to configure your elasticsearch to run as a cluster with 2 or more nodes on the same or different machine. This will allow elastic to replicate your indices to other nodes.

4 years ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

The index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b status is red. Meaning that the data for this index got corrupted. Since there are no replicas the only feasible option would be to delete this index. All the training scalars events for the old taskd would be lost then. But the newly created tasks should start working fine.
curl -XDELETE

2 years ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

Ok, I see. Then you can enter the apiserver container:
sudo docker exec -it clearml-apiserver /bin/bashAnd run the following commands inside the container
curl -XGET curl -XGET

2 years ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

Are you running your dockers on Linux or Windows?

2 years ago

0 Hi Clearml, I Tried To Upgrade The Clearml Server Following This

I mean it is not possible to open v3.6 data in version 4.4. That's why the steps 3-10 are there

2 years ago

0 Hi Clearml, I Tried To Upgrade The Clearml Server Following This

Are you sure that it was performed fully according to the suggested sequence? The error that you posted says that v3.6 data is incompatible with v4.4 and suggests version 4.2 or earlier. Step 3 starts with mongo 4.0 that should be able to open v3.6 data. And then a number of gradual updates through versions 4.0->4.2->4.4 is performed

2 years ago

0 Hi Clearml, I Tried To Upgrade The Clearml Server Following This

Hi SarcasticSparrow10 , I am trying to understand whether we have some gaps in the instructions. In the upgrade process did you perform the steps 3-10 of the below instruction? Were there any errors when performing these steps?
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration

2 years ago

0 What Is The Current State Of Deleting Debug Samples? I Use S3/Minio As My Fileserver. If I Delete Tasks From The Ui, Are Debug Samples Deleted On S3? If I Run The Cleanup Service Script, Does It Debug Samples On S3?

Hi @<1523701868901961728:profile|ReassuredTiger98> , how exactly do you override the values in storage_credentials file? Do you prepare a new docker image with the changed file or map this file from outside with the volume mapping in the docker compose or through the env variables? What is also important is that you do this override for the async_delete service. It is the service that actually uses the storage credentials. Not the apiserver itself

one year ago

0 Hey When I Would Like To Remove Experiment From Project From App.Clear.Ml I Got This Message :

Hi ExasperatedCrocodile76 , what version of the clearml server are you using? You can see it in the bottom right corner of the Settings screen

2 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

SubstantialBaldeagle49 This is fine. When you start docker-compose it takes different time for the services to start. Apiserver waits for the Elasticsearch to start and proceeds once it is ready. Can you reproduce the buckets issue and share the apiserver log that contains it?

4 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

SubstantialBaldeagle49 The log looks OK. Where do you see the error?

4 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

The index "events-plot-d1bd92a3b039400cbafc60a7a5b1e52b" is red meaning that it is corrupted and elastic cannot work with it. The most straightforward solution would be to delete this index but it will result in all the plots generated so far will be lost.

4 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

What can be seen in the logs is that for some reason Elasticsearch had internal failure when trying to perform the plots query. I will send you the instruction on how to check for the health of ES nodes. It may provide us with some clues

4 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

Please run these commands and see if you have any "red" statuses in the output:
curl " http://localhost:9200/_cluster/health?pretty "
curl " http://localhost:9200/_cluster/health?level=indices&pretty "

4 years ago

0 Hi, We Have A Self Hosted Clearml Server Which I Mainly Use For Experiment Tracking. There Is One Issue I Have Noticed Recently: Whenever I Archive And Delete An Experiment (With The Box " Remove All Related Artifacts And Debug Samples From Clearml File

As long as you delete only from the deleted tasks folders it should be OK

one year ago

The 1.10 version handles files deletion differently so there is chance that it fixes the issue. If you use the default apiserver port then I would try upgrading. If you override the apiserver port then please wait for the hotfix version 1.10.1 that should be released soon

one year ago

I would backup the dbs prior to the upgrade so that you can rollback in case any issue arise in the upgrade process

one year ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

Do you mean the "search_phase_execution" error? Yes, stopping containers, deleting the data folder and running the containers again would bring you to a "clean install" state. But then you would loose all your data not only the task scalar results

2 years ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

Hi MortifiedDove27 , you can run the following commands on the clearml server host to get the docker logs for the apiserver and elasticsearch:
sudo docker logs clearml-apiserver > apiserver.logs 2>&1 sudo docker logs clearml-elastic > elastic.logs 2>&1

2 years ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

It seems that elasticsearch is failing on any search request. Can you please run the following commands and share the results?
curl -XGET curl -XGET

2 years ago

0 Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

Glad to hear that it helped:)

2 years ago

0 I'M Trying To Set Up Clearml Server On A New Vm But The Elasticsearch Container Is Erroring With The Following:

Hi WittyOwl57 , there is a chance that the reason is in this setting: Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log ...
First it say says about invalid log option that may require further investigation. Second the file that it tries to write to is logs/gc.log and it is not under the $clearml folder where you give the write permissions to the ES user. I would try cancelling JVM logging at all or specifying the full path to the file so that it would be under the folder that has 1000:1000 o...

2 years ago

0 Hi All, I Have A

Hi CooperativeFox72 , how much free space do you have on your disk now? If you run du on your /opt/trains/data/elastic_7 folder in let's say 5 mins intervals do you see the folder size is growing?

3 years ago

0 Hi I Have An Issue Where Experiments Are All Showing That They Started From Iteration 0. This Is Even True For Experiments Which I Know Used To Show The Correct Iteration, So It Seems To Be Due To An Update Of The Web Interface. Here You Can See That Sup

The data that you sent looks fine. It seems that you actually has these iterations in Elasticsearch. To check whether it is the case please run the following command in the shell on your host. You should get the first 10 task events with the smallest iterations:
curl -XGET -H "Content-Type: application/json" localhost:9200/events-training_stats_scalar*/_search?pretty -d' { "query": { "term": {"task": "d45ecb5ad7084175bd83dd39777b10c5"} }, "sort": {"iter": "asc"} }'

2 years ago

MassiveHippopotamus56 The data that you posted from the browser developers tool seems coming from the "Headers" tab. Can you please post the data from the "Payload" and "Response" tabs. This is in case you run in Chrome. In other browsers the tabs may have different names

2 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

SubstantialBaldeagle49 Well, I see. Elaticsearch does not support putting that large number into max_buckets. From the error message that I see in the apiserver log I am not sure that the original problem is connected to the buckets number. Can you please revert the max_bucket change, reproduce the original problem and share the elasticsearch log?

4 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

Sure, you delete it with the following command:
curl -XDELETE " http://localhost:9200/events-plot-d1bd92a3b039400cbafc60a7a5b1e52b "
Once deleted it will be automatically recreated by the api server and should see the plots from the new tasks that you run afterwards

4 years ago

0 Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

SubstantialBaldeagle49 This should collect the logs: 'sudo docker logs trains-apiserver >& apiserver.logs'

4 years ago

0 I Finally Got The Cleanup_Service.Py To Run. However, Now I Get Errors When Trying To Load Scalars. This Is What I Found In The Logs

ReassuredTiger98 What are the memory settings for Elasticsearch in your docker compose? If it is 2 Gb and you have enough memory on your server then you can try to increase it to 4gb like this: ES_JAVA_OPTS: -Xms4g -Xmx4g

3 years ago

0 We'Re Running Into Errors Such As This:

Hi UnevenDolphin73 . how many artifacts do you have on this task? We are storing task metadata in Mongo and there is a limit of 16Mb per a single document. While the artifact itself is not stored under the task there is some metadata (notably the uri and display_data/preview) that is stored for each artifact

2 years ago

Show more results