Description
When using the Feishu integration, if the underlying connection suffers a keepalive ping timeout, the SDK's message loop exits, but the main Hermes agent process doesn't terminate or successfully reconnect. This leaves the Gateway in a zombie state where it appears "running" to system daemon managers (like systemd) but accepts no messages.
Logs
[Lark] [ERROR] receive message loop exit, err: sent 1011 (internal error) keepalive ping timeout; no close frame received
[Lark] [WARNING] ping failed, err: sent 1011 (internal error) keepalive ping timeout
Expected Behavior (Crash-Only Architecture)
If the Feishu websocket loop permanently drops and cannot intrinsically reconnect, the feishu.py integration thread should raise a SystemExit(1) or bubble the exception to the parent thread. System level managers (Restart=always) can then forcefully respawn a healthy agent stack.
Environment
- OS: Ubuntu 24.04 via WSL2
- Deploy type: systemd service
- Provider: feishu
Description
When using the Feishu integration, if the underlying connection suffers a
keepalive ping timeout, the SDK's message loop exits, but the main Hermes agent process doesn't terminate or successfully reconnect. This leaves the Gateway in a zombie state where it appears "running" to system daemon managers (like systemd) but accepts no messages.Logs
Expected Behavior (Crash-Only Architecture)
If the Feishu websocket loop permanently drops and cannot intrinsically reconnect, the
feishu.pyintegration thread should raise aSystemExit(1)or bubble the exception to the parent thread. System level managers (Restart=always) can then forcefully respawn a healthy agent stack.Environment