I have an interesting issue. It seems like when I run several agents on the same server I run into an issue where some of the agents will timeout when connecting to github. Each task in the pipeline is pulling from the same repo and it's working for some of the tasks and not for others. I was thinking that it could be a rate limit issue (but I'm not sure). I'm going to reduce the number of agents to see if that helps but I'm curious if it's possible to "retry" in the case of an ssh timeout? Maybe some fix number of times before killing the process. It's a shame that an entire dag will fail because one or two instances. Any suggestions are welcome as well.


2024-01-24 16:21:36
Could not lock cache folder "LTV.git" (timeout 300 sec), using temp vcs cache.
cloning: git@github.com:TicketSwap/LTV.git
2024-01-24 16:23:50
ssh: connect to host github.com port 22: Connection timed out
fatal: Could not read from remote repository.


Posted 2 months ago
After some digging we found it was actually caused by the routers IPS protection. I thought it would be strange for github to be throttling things at this scale.

Posted 2 months ago

It's a corporate one. We are also looking into options on Github's end.

Posted 2 months ago

I know other services (GitLab, DockerHub etc.) also rate-limit, I assume it's the same case

Posted 2 months ago

Hi @<1545216070686609408:profile|EnthusiasticCow4> , it sure looks like a rate limit issue - are you using a free GitHub account or a paid one?

Posted 2 months ago

Since this could happen with a lot of services, maybe it would be worth a retry option? Especially if it's part of a pipeline.

Posted 2 months ago

Well, usually it's a matter of tier - I would assume adding a retry mechanism to bypass tier rate limit mechanisms in other services is a very nice thing to do 🙂

Posted 2 months ago
