Skip to content

Thread::DetachThread deadlocks during process shutdown when CoGetErrorInfo returns a cross-apartment proxy #126964

@smourier

Description

@smourier

Description

During normal process shutdown, the CLR's FiberDetachCallbackRuntimeThreadShutdownThread::DetachThread calls CoGetErrorInfo to clear COM error state. If the returned IErrorInfo is a cross-apartment COM proxy, the QueryInterface / Release on that proxy triggers a cross-apartment RPC call that deadlocks because the target apartment's thread is already gone.

The process hangs indefinitely on exit with a single thread alive, blocked in NtUserMsgWaitForMultipleObjectsEx.
Note when it hangs, it's on the last thread in the process, and this thread was the main one, a standard UI thread.

Reproduction Steps

Unfortunately I cannot reproduce every time (and I cannot give the original code) but the stack trace is always the same when it hangs. Copilot Claude Opus 4.6 told me it was a solid case with the stack trace and prepared most of this text.

Expected behavior

Exiting process should not hang.

Root Cause

RtlpFlsDataCleanup (called during LdrShutdownProcess) processes FLS slot callbacks in an arbitrary order with no dependency awareness. The following sequence occurs:

COM's own FLS callback (in cerror.cxx / combase.dll) runs and calls CoSetErrorInfo, setting an IErrorInfo that is a cross-apartment proxy
The CLR's FLS callback (FiberDetachCallback) runs and calls CoGetErrorInfo, retrieves that proxy, and calls QueryInterface on it
COM marshals the QueryInterface to the proxy's owning apartment via RPC (CSyncClientCall::SendReceive)
The owning apartment's thread no longer exists (we're inside LdrShutdownProcess, loader lock is held)
The RPC call enters a modal message loop (CCliModalLoop::BlockFn) waiting for a response that will never come
Deadlock — the process hangs on exit

Stack Trace

0  win32u.dll!NtUserMsgWaitForMultipleObjectsEx+0x14
1  combase.dll!CCliModalLoop::BlockFn+0x110
2  combase.dll!ModalLoop+0xb9
3  combase.dll!ThreadSendReceive+0x1318
4  combase.dll!CSyncClientCall::SwitchAptAndDispatchCall+0x13b3
5  combase.dll!CSyncClientCall::SendReceive2+0x1526
6  combase.dll!SyncClientCallRetryContext::SendReceiveWithRetry+0x37
7  combase.dll!CSyncClientCall::SendReceiveInRetryContext+0x37
8  combase.dll!ClassicSTAThreadSendReceive+0x1a2
9  combase.dll!CSyncClientCall::SendReceive+0x509
10  combase.dll!CClientChannel::SendReceive+0x49
11  combase.dll!NdrExtpProxySendReceive+0xb3
12  rpcrt4.dll!NdrpClientCall3+0x431
13  combase.dll!ObjectStublessClient+0x146
14  combase.dll!ObjectStubless+0x42
15  combase.dll!CStdMarshal::Begin_RemQIAndUnmarshal1+0xd4
16  combase.dll!CStdMarshal::Begin_RemQIAndUnmarshal+0x2b
17  combase.dll!CStdMarshal::Begin_QueryRemoteInterfaces+0x66
18  combase.dll!CStdMarshal::QueryRemoteInterfaces+0xe7
19  combase.dll!CStdIdentity::CInternalUnk::QueryMultipleInterfacesWithCallerAddress+0x1a4
20  combase.dll!CStdIdentity::CInternalUnk::QueryInterfaceWithCallerAddress+0x1c2
21  combase.dll!CStdIdentity::CInternalUnk::QueryInterface+0x1ef
22  combase.dll!CoGetErrorInfo+0x81
23  coreclr.dll!Thread::DetachThread+0x2f
24  coreclr.dll!RuntimeThreadShutdown+0x29
25  coreclr.dll!FiberDetachCallback+0x3c
26  ntdll.dll!RtlpFlsDataCleanup+0xff
27  ntdll.dll!LdrShutdownProcess+0x245
28  ntdll.dll!RtlExitUserProcess+0x9e
29  kernel32.dll!ExitProcessImplementation+0xb
30  ucrtbase.dll!common_exit+0xc7
31  OverPaint.exe!__scrt_common_main_seh+0x173
32  kernel32.dll!BaseThreadInitThunk+0x17
33  ntdll.dll!RtlUserThreadStart+0x2c

Additionally, a breakpoint on combase!CoSetErrorInfo confirms that COM's own FLS callback (from cerror.cxx) is the one setting the IErrorInfo during RtlpFlsDataCleanup, before the CLR's callback retrieves it:

Regression?

I'm not sure.

Suggested Fix

Thread::DetachThread should not call CoGetErrorInfo during process shutdown (i.e., when called from an FLS cleanup callback inside LdrShutdownProcess). At that point, cross-apartment COM calls are inherently unsafe. Possible approaches:

Skip the CoGetErrorInfo call in Thread::DetachThread when the process is shutting down
Use CoGetErrorInfo but detect that the returned IErrorInfo is a cross-apartment proxy and avoid calling QueryInterface / Release on it
Call SetErrorInfo(0, NULL) instead of CoGetErrorInfo to simply clear the slot without retrieving (and thus QI-ing) the object
Note: there's arguably also a bug in combase.dll's FLS callback (cerror.cxx) — it should not be setting a cross-apartment IErrorInfo proxy during process teardown. But the CLR should be defensive regardless.

Workaround

Calling SetErrorInfo(0, IntPtr.Zero) via P/Invoke at the end of Main() (before the process exits) prevents the deadlock by clearing the error info before FLS cleanup runs.

Configuration

Windows 11 pro 25H2 latest official patches
.NET: 10.0.6 (latest official)
It's an exe. The configuration is AOT compatible, but not published as AOT, just ran normally or from Visual Studio.

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions