Hi @<1699955693882183680:profile|UpsetSeaturtle37>
What's your clearml-session version? where is the remote machine ?
And yes if the network connection is bad we have seen this behavior you can try with --keepalive=true
Notice that these are SSH networking issue, not something to do with the clearml-session layer the --keepalive is trying to automatically detect these disconnects and make sure it reconnects for you.
@<1699955693882183680:profile|UpsetSeaturtle37> can you try with the latest clearml-session (0.14.0) I remember a few improvements there
The remote machine is in Azure behind the load-balancer, we are using docker images, so directly connecting to pods.
yeah LB in the middle might be introducing SSH hiccups, first upgrade to the latest clearml-session it better ocnfigures the SSH client/server to support longer timeout connection, if that does not work try the -- keepalive=true
Let me know how it goes
Hello Martin, thanks for the response. My clearml-session version is 0.9.0 and clearml version is 1.14.1. The remote machine is in Azure behind the load-balancer, we are using docker images, so directly connecting to pods.
Thanks for the response will update you if anything happens 👍
I updated the clearml-session to 0.14.0. When I start the interactive session it seems successful at first, I can see it on the ClearML Web UI, however console raises error and stops the connection:
@<1699955693882183680:profile|UpsetSeaturtle37> good progress, regrading the error, 0.15.0 is supposed to be out tomorrow, it includes a fix to that one.
BTW: can you run with --debug
clearml-session versions from 0.14.0 to 0.11.0 are all failed (raised the same error as above), I can only use 0.10.0 (the main clearml version is 1.15.1).