Has Anyone Else Benchmarked Clearml? I'M Seeing Catastrophic Logging Overhead:

Unanswered

the "spike" is not a spike, it's a simple cache mechanism that is designed to reduce API calls and sends an API request once 100 events are cached

yes, we realized that later. this synchronous pause is enough to 4x the training time for this model. for a logging library, I think it's fair to call that catastrophic...
what would be the impact if we changed the flush logic to instead return() instead of sleep(0.1) ? can the queue have arbitrarily many events in its cache without failing?

  				
Posted 
	7 months ago

					More
				  		
  Report
		
					DepressedMonkey10
				
					0
					 × 1

97 Views

0 Answers

7 months ago