Wait I might be completely off.
Is this line "hangs" ?
task.execute_remotely(..., exit_process=True)
We are using k8s glue to spawn the job. ...
I think this is actual network latency, nothing to do with the jobs, could it be the server is very far away?
What happens when you manually start a Task from your machine ?
Is the latency fixed? Is it just when starting a new Task?
If the only issue is this linetask.execute_remotely(..., exit_process=True)
It has to finish the static analysis of the entire repository (which usually happens in the background but now we have to wait for it). If the repo is large this could actually take 20sec (depending on CPU/drive of the machine itself)
We are using k8s glue to spawn the job. Would you be able to advise in detail of steps on what goes on when the above code executes?
Hi, i will have to get back to you again. Need to check every client's repo to determine your hypothesis.
The server is running only the ClearML components. Could you advise on the ELB part, how should we optimise it?
Hi SubstantialElk6
32 CPU cores, 64GB ram
Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded