Unanswered
Has Anyone Else Benchmarked Clearml? I'M Seeing Catastrophic Logging Overhead:
the "spike" is not a spike, it's a simple cache mechanism that is designed to reduce API calls and sends an API request once 100 events are cached
yes, we realized that later. this synchronous pause is enough to 4x the training time for this model. for a logging library, I think it's fair to call that catastrophic...
what would be the impact if we changed the flush logic to instead return()
instead of sleep(0.1)
? can the queue have arbitrarily many events in its cache without failing?
28 Views
0
Answers
one month ago
one month ago