Hey There Have The Following Issue After Upgrading Server And Trains To 0.16:

Answered

hey there

Have the following issue after upgrading server and trains to 0.16:
Error 100 : General data error (TransportError(503, 'search_phase_execution_exception', 'Trying to create too many buckets. Must be less than or equal to: [10000] but was [11633]. This limit can be set by changing the [search.max_buckets] cluster level setting.'))Error appears when checking scalar plots. Randomly appeared after training for a while (it was ok for e.g. first epoch).

This seems to be coming from ES: https://discuss.elastic.co/t/search-max-buckets-limit-error-on-7-0-1/179989

  				
Posted 
	4 years ago

					More  		
  Report
		
					ElegantKangaroo44
				
					0
					 × 1

Votes Newest

Answers 34

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

SubstantialBaldeagle49 This should collect the logs: 'sudo docker logs trains-apiserver >& apiserver.logs'

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

AppetizingMouse58 Thanks so much!, Could u tell why does this happen? If it happen next time , is there any other solution?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

AppetizingMouse58 Great, Thanks so much! You have done a great work.
Another question, how to configure elasticsearch to run as a cluster with 2 or more nodes on the same or different machine 😅

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

The error is still there for new experiment

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

What can be seen in the logs is that for some reason Elasticsearch had internal failure when trying to perform the plots query. I will send you the instruction on how to check for the health of ES nodes. It may provide us with some clues

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

Can i backup my experiments?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

I have remake curl, this commad "sudo curl -L " https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compos" will work

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

AppetizingMouse58 here:

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

Setting up an elastic cluster requires some devops. You can search for "setup elasticsearch 7 cluster" in the internet and there are some tutorials there. Stopping your docker-compose once in a certain period of time and backing up the /opt/trains/data folder is more straightforward and it would backup also the data that we store in mongodb.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

AppetizingMouse58 Ok , i see, thanks!

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

This is the second command

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

SubstantialBaldeagle49 The log looks OK. Where do you see the error?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

https://stackoverflow.com/questions/28287261/connection-timeout-with-elasticsearch
How to set the timeout?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

AppetizingMouse58

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

AppetizingMouse58 After i modify the max_buckets to 200000000000, I cannot see any information in web ui. Here is the log:

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

SubstantialBaldeagle49 This is fine. When you start docker-compose it takes different time for the services to start. Apiserver waits for the Elasticsearch to start and proceeds once it is ready. Can you reproduce the buckets issue and share the apiserver log that contains it?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

That's ok, i have just start the server. Could you tell me how to delete this index?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

May be its due to my curl?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

SuccessfulKoala55 AppetizingMouse58 I delete logs/apiserver.log, and restart the server , and here is the log. It show cannot connect to ElasticSearch

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

Ok, i have reverte the change ,here is two log:

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

AppetizingMouse58 Ok, this is the full log, here seems a error:

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

AppetizingMouse58

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

Ok , i will start a new experiment to see if the error will be still there? Sorry i dont really get how to show the trains-apiserver log

  				
Posted 
	4 years ago

					More  		
  Report
		
					SubstantialBaldeagle49
				
					0
					 × 1

Might be some other issue related to loading plots from elastic. Can you show the trains-apiserver log again after you received the error - there should be some more information there

  				
Posted 
	4 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Please run these commands and see if you have any "red" statuses in the output:
curl " http://localhost:9200/_cluster/health?pretty "
curl " http://localhost:9200/_cluster/health?level=indices&pretty "

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

SubstantialBaldeagle49 Well, I see. Elaticsearch does not support putting that large number into max_buckets. From the error message that I see in the apiserver log I am not sure that the original problem is connected to the buckets number. Can you please revert the max_bucket change, reproduce the original problem and share the elasticsearch log?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AppetizingMouse58
				
					0

Show more results

Write your answer

66K Views

34 Answers

4 years ago

one year ago