Hi, I Am Having Problem With Clearml Running On Our Private Server. This Error Occured On Older Version On Clearml And Server. Now After Update And Purge Of All Old Database With

Answered

Hi, I am having problem with ClearML running on our private server.
This error occured on older version on clearML and server. now after update and purge of all old database with docker down -v the error persists and I have no idea how to fix it. ClearML and server are up to latest version as of 31 May 2022

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

Votes Newest

Answers 19

Hi MortifiedDove27 , you can run the following commands on the clearml server host to get the docker logs for the apiserver and elasticsearch:
sudo docker logs clearml-apiserver > apiserver.logs 2>&1 sudo docker logs clearml-elastic > elastic.logs 2>&1

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

Ok, I see. Then you can enter the apiserver container:
sudo docker exec -it clearml-apiserver /bin/bashAnd run the following commands inside the container
curl -XGET curl -XGET

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

It seems that elasticsearch is failing on any search request. Can you please run the following commands and share the results?
curl -XGET curl -XGET

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

Glad to hear that it helped:)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

can you please provide the apiserver log and the elasticsearch log?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Hi Igor
we are working on your issue and will update you asap

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Hey Igor
I am not the expert about this topic. I have someone who better knows the topic that is coming back to you straight after his meeting. 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

I have firewall installed on the server and not all ports are open

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

Hi David, where can I get these logs?

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

The index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b status is red. Meaning that the data for this index got corrupted. Since there are no replicas the only feasible option would be to delete this index. All the training scalars events for the old taskd would be lost then. But the newly created tasks should start working fine.
curl -XDELETE

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

curl: (7) Failed to connect to localhost port 9200: Connection refused

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

doesn't fit in 1 message in slack

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

Thank you very much it worked! I hope I will never see this kind of bug, will be happy to give more feedback if you would like to find a rootcause

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

do you think if we manually delete folder /opt/clearml/data/ that would solve this problem same way?

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

Do you mean the "search_phase_execution" error? Yes, stopping containers, deleting the data folder and running the containers again would bring you to a "clean install" state. But then you would loose all your data not only the task scalar results

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

Are you running your dockers on Linux or Windows?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

AppetizingMouse58 Thanks for the answer, sending the logs

  				
Posted 
	2 years ago

					More  		
  Report
		
					MortifiedDove27
				
					0
					 × 1

Write your answer

1K Views

19 Answers

2 years ago