Skip to content

Detect dead connections via keepalive timeout#310131

Open
EhabY wants to merge 2 commits intomicrosoft:mainfrom
EhabY:fix/keepalive-timeout-detection
Open

Detect dead connections via keepalive timeout#310131
EhabY wants to merge 2 commits intomicrosoft:mainfrom
EhabY:fix/keepalive-timeout-detection

Conversation

@EhabY
Copy link
Copy Markdown
Contributor

@EhabY EhabY commented Apr 15, 2026

_recvAckCheck only fires when there are unacknowledged regular messages. If a connection dies silently while idle, the timeout is never detected.

Add _keepAliveTimeoutCheck: since both sides send keepalives every 5s, receiving no data for 20s means the connection is dead. Called from _sendKeepAlive after each keepalive is sent. This assumes that there is symmetry, if the client sends keepalives then so should the server.

Related to microsoft/vscode-remote-release#11538

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📬 CODENOTIFY

The following users are being notified based on files changed in this PR:

@bpasero

Matched files:

  • src/vs/base/parts/ipc/common/ipc.net.ts
  • src/vs/base/parts/ipc/test/node/ipc.net.test.ts

Comment on lines +1190 to +1191
timeSinceLastReceivedSomeData >= ProtocolConstants.TimeoutTime
&& timeSinceLastTimeout >= ProtocolConstants.TimeoutTime
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need those values to be configurable. There are environments where the 5s KeepAliveSendTime or 20s TimeoutTime are not appropriate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, a configurable timeout would be useful. These constants live in the base IPC layer though, so it would need some plumbing to get a setting value down there.

For naming, maybe something under a new prefix like ipc.keepAliveTimeout or connection.keepAliveTimeout? Also this would just affect the client dead-connection check, like TimeoutTime would still non-configurable as it is now no?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds liveness detection to PersistentProtocol so silent/idle connection deaths are detected even when there are no unacknowledged regular messages (i.e., cases where _recvAckCheck would never run). This supports more reliable remote/IPC reconnection behavior (related to microsoft/vscode-remote-release#11538).

Changes:

  • Add _keepAliveTimeoutCheck() and invoke it after each keepalive send to detect “no incoming data for TimeoutTime” cases.
  • Add a regression test ensuring keepalive-based timeouts fire when the peer stops sending data while no regular messages are pending.
Show a summary per file
File Description
src/vs/base/parts/ipc/common/ipc.net.ts Adds keepalive-driven socket timeout detection to catch silent idle disconnects.
src/vs/base/parts/ipc/test/node/ipc.net.test.ts Adds a unit test validating the new keepalive timeout behavior when there are no pending regular messages.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment thread src/vs/base/parts/ipc/common/ipc.net.ts Outdated
Comment thread src/vs/base/parts/ipc/test/node/ipc.net.test.ts Outdated
Comment on lines +1189 to +1192
if (
timeSinceLastReceivedSomeData >= ProtocolConstants.TimeoutTime
&& timeSinceLastTimeout >= ProtocolConstants.TimeoutTime
) {
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_keepAliveTimeoutCheck() can fire a socket-timeout even when there are unacknowledged regular messages (it only checks time since last received data). This changes the existing timeout semantics: e.g. if the last incoming data was >20s ago but a new regular message was sent recently, this method can trigger a timeout earlier than the existing _recvAckCheck() would. If the intent is to catch idle silent deaths, consider guarding this check to only run when there are no unacknowledged regular messages (e.g. _outgoingUnackMsg is empty / unacknowledgedCount===0), or include timeSinceOldestUnacknowledgedMsg in the condition similar to _recvAckCheck().

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is exactly to fire on idle silence: no data for 20s means the link is dead regardless of outgoing state. Guarding on unacknowledgedCount === 0 or ANDing in timeSinceOldestUnacknowledgedMsg reintroduces the gap this PR closes. The whole point is to measure silence from the peer, not time since our last send

EhabY added 2 commits April 18, 2026 18:49
_recvAckCheck only fires when there are unacknowledged regular messages.
If a connection dies silently while idle, the timeout is never detected.

Add _keepAliveTimeoutCheck: since both sides send keepalives every 5s,
receiving no data for 20s means the connection is dead. Called from
_sendKeepAlive after each keepalive is sent.
@EhabY EhabY force-pushed the fix/keepalive-timeout-detection branch from bd3b9b8 to 6bc287b Compare April 18, 2026 15:50
@EhabY EhabY requested a review from benoit-nexthop April 18, 2026 15:51
@EhabY
Copy link
Copy Markdown
Contributor Author

EhabY commented Apr 21, 2026

@alexdima this is ready for another review, please let me know if this should/will be tackled also by the Remote SSH extension as well (closed source though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants