I am experiencing performance issues with using ClearML together with Pytorch Lightning CLI for experiment tracking. Essentially what we're doing is fetching the logger object through task.get_logger() and then using the reporting methods. However, it adds a huge overhead to our model training in terms of time spent.
Attached is a picture comparing time-per-epoch for training with ClearML logging enabled or disabled.
I'm assuming this is because the logging is synchronous and thread-blocking. Is there any way to configure the logger to work in a background thread, or batch x number of messages before sending, etc.?