I would assume a lot of them are logs streaming? So you can try reducing printouts / progress bars. That seems to help for me.
For context: I have noticed the large number of API calls can be a problem when networking is unreliable. It causes a cascade of slow retries and can really hold up task execution. So do be cautious of where work is occurring relative to where the server is, and what connects the two.
I didn't do a very scientific comparison but the # of API calls did decrease substantially by turning off auto_connect_streams
It is probably about 100k API calls per day with 1 experiment running where before it was maybe 300k API calls per day. Still seems like a lot when I only run 20-30 epochs in a day
for me, it was to set loglevel higher up and reduce the number of prints that my code was doing. since I was using a logger instead of prints, it was pretty easy.
If you're using some framework that spits out its own progress bars, then I'd look into disabling those from options available.
Turning off logs entirely I don't know, will let clearml ppl respond to that.
For sure though the comms of CPU monitoring and epoch monitoring will lead to a lot of calls... but i'll agree 80k seems excessive.
To debug how many of them are retries vs original calls: try putting worker and server on the same network, avoid DNS entirely. On my end, this had the biggest impact (most calls were retries due to networking issues).
Thanks! It looks like I can set
auto_connect_streams = False
in the task init at least to try.
We are using Keras so it is logging progress bars by default, which I think we could turn off. I just wouldn't expect logging text to require so many api calls. Especially since they charge by API calls I assumed it would be better managed.
Will do! It probably won't be until next week. I don't plan on stopping this run to try it but will definitely follow up with my results.
Yea I think if we self-hosted I wouldn't have noticed it at all
It's possible, is there a way to just slow down or turn off the log streaming to see how it affects the API calls?
ah, I'm self-hosting.
progress bars could easily take up several thousand calls, as it moves with each batch.
would love to know if the # of API calls decreases substantially by turning off auto_connect_streams
. please post an update when you have one 😃