Hello, We Have Noticed That Debug Images Takes Way Too Long To Load 6 To 8 Seconds When Requesting Events.Debug_Images We Also Notice That The Returned Object Is Quite Heavy, About 2-3 Mb, I Think It Contains A Lot Of Unnecessary Info We Also Have 20 Mill

Answered

Hello, we have noticed that debug images takes way too long to load
6 to 8 seconds when requesting events.debug_images
We also notice that the returned object is quite heavy, about 2-3 MB, i think it contains a lot of unnecessary info
We also have 20 million image entries into elasticsearch index (you guys dont have rolling indexes which has also caused a lot of problems before)
What could we do to improve the loading time? We just upgraded fro, 1.17.0 to 2.1.0 but no improvement is seen

  				
Posted 
	one month ago

					More
				  		
  Report
		
					CleanBee5
				
					0
					 × 1

Votes Newest

Answers 7

@<1855782479961526272:profile|CleanBee5> , I'm guessing that the files server (debug samples are saved there by default) is experiancing load due to the amount of debug samples.

Regarding the size, ClearML SDK only logs/uploads whatever you create, so if you want less load, you can also use smaller/lighter images.

  				
Posted 
	one month ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Not a solution, but just curious: why would you need that many "debug" images ?
Those are images automatically generated by your training code that ClearML automatically upload them. May be disable auto upload image during Task Init ?

  				
Posted 
	one month ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

its not millions of images per task
Its around 10k to 100k
But ok, i understand its not possible to do optimize this currently then

  				
Posted 
	one month ago

					More
				  		
  Report
		
					CleanBee5
				
					0
					 × 1

@<1855782479961526272:profile|CleanBee5> , in that case you should beef up the resources running the apiserver.

In the end of the day, this 2-3mb payload basically contains all the events (including debug samples metadata + links) of thousands of iterations. You could modify FE code to load a smaller range of iterations or remove some metadata (the metadata should be minimal, such as iteration and name of the metric)

Additionally, I believe the enterprise version supports rolling indexes.

  				
Posted 
	one month ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

then dont use clearml to look at images

I don't think ClearML is design to vizualize millions of image per task. At least not the Debug samples section. That was design so that you can see for a given set of image, how does the model perform epoch after epoch.

For vizu millions of image, you have tool like Fiftyone.

  				
Posted 
	one month ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

We are not using fileserver to store the images
We are using S3, the image loading itself is fast, aroung 100ms
What is slow is the request to server which creates a gigantic json that is sent back.
Saying to use lower res and smaller amount of images is like saying "then dont use clearml to look at images"

  				
Posted 
	one month ago

					More
				  		
  Report
		
					CleanBee5
				
					0
					 × 1

Because we need a lot of debug images
We have experiments where assessing quality requires us to look at a lot of images

  				
Posted 
	one month ago

					More
				  		
  Report
		
					CleanBee5
				
					0
					 × 1

Write your answer

246 Views

7 Answers

one month ago