Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Answered

Hi there,
Our. self-hosted server is periodically very slow to react in the web UI. We've been debugging for quite some time, and it would seem that elastisearch might be the culprit. Looking at the elastisearch index, we have an index of around 80G of training scalars. Now, I have no idea wether that is OK or not, but I guess that is a rather large index to move around in. We are using the cleanup script from the examples folder, but the large index suggest that we may not have actually deleted old scalars.

Any thoughts? Maybe @<1523701070390366208:profile|CostlyOstrich36> ?

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

Votes Newest

Answers 15

WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
But be careful, upgrading is extremely dangerous

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					AmiableSeaturtle81
				
					0
					 × 1

Hi @<1523701601770934272:profile|GiganticMole91> , As long as experiments are deleted then their associated scalars are deleted as well.

I'd check the ES container for logs. Additionally, you can always beef up the machine with more RAM to give elastic more to work with.

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

@<1523701601770934272:profile|GiganticMole91> Thats rookie numbers. We are at 228 GB for elastic now

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					AmiableSeaturtle81
				
					0
					 × 1

@<1590514584836378624:profile|AmiableSeaturtle81> this was last time i tried: https://clearml.slack.com/archives/CTK20V944/p1725534932820309

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

Which version of the server are you running?

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

What you want is to have a service script that cleans up archived tasks, here is what we used: None

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					AmiableSeaturtle81
				
					0
					 × 1

@<1590514584836378624:profile|AmiableSeaturtle81> that’s the service we are using :-)

How much RAM have you assigned to your elastic service?

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

@<1722061389024989184:profile|ResponsiveKoala38> cool, thanks! I guess it will then be straightforward to script then.

What is your gut feeling regarding the size of the index? Is 87G a lot for an elastisearch index?

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

7 out of 30 GB is currently used and is quite stable

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					AmiableSeaturtle81
				
					0
					 × 1

Hi @<1523701070390366208:profile|CostlyOstrich36>
Is 87G a lot for an index? Enough that you would consider adding more RAM?

And also, how can I check that we are not storing scalars for deleted tasks? ClearML used to write a lot of errors in the cleanup script, although that seems to have been fixed in recent updates

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

has 8 cores, so nothing fancy even

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					AmiableSeaturtle81
				
					0
					 × 1

Any tips on how to check if we are storing data on deleted tasks? Maybe @<1722061389024989184:profile|ResponsiveKoala38> knows? Is there a field on each scalar that I can cross check with ClearML?

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

Can confirm that for me usually increasing RAM solves the problem. ES is sometimes very aggressive.

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					MagnificentBear85
				
					0
					 × 1

Yes, I tried updating recently, it costed me a full days work of rolling back versions until I found something that worked 😅

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					GiganticMole91
				
					0
					 × 1

Hi @<1523701601770934272:profile|GiganticMole91> , each scalar document in ES has a "task" field that is a task ID. The below query will show you the first 10 documents for the task ID:

curl -XGET "localhost:9200/<the scalar index name>/_search?q=task:<task ID>&pretty"

  				
Posted 
	2 months ago

					More
				  		
  Report
		
					ResponsiveKoala38
				
					0

Write your answer

181 Views

15 Answers

2 months ago