Commit 3f74b3c
Update worker thread pool to use time based wait. (#27916)
# Make thread pool spin duration configurable via session option
## Problem
The ORT Eigen thread pool's `SpinPause` loop uses a fixed iteration
count (`1 << 20` = ~1M iterations) before blocking. The actual
wall-clock spin duration varies dramatically by CPU architecture:
| Pause Instruction | Architecture | Spin Duration (1M iterations) |
|---|---|---|
| `_mm_pause` | Pre-Skylake | ~3ms |
| `_mm_pause` | Skylake+ @ 3 GHz | ~47ms |
| `_tpause` | 3 GHz base | ~333ms |
| `_tpause` | 2 GHz base | ~500ms |
For client/on-device workloads (e.g., Whisper in Edge), this causes high
CPU utilization visible in profilers and Task Manager, even though the
CPU is in a low-power spin state.
So 1M iterations at 3 GHz:
- **Pre-Skylake:** 1M × 10 / 3G ≈ **3.3ms**
- **Skylake @ 3 GHz:** 1M × 140 / 3G ≈ **47ms**
- **Skylake @ 5 GHz (turbo):** 1M × 140 / 5G ≈ **28ms**
- **AMD Zen @ 4 GHz:** 1M × 65 / 4G ≈ **16ms**
The total duration scaled inversely with clock speed and varied
dramatically across microarchitectures. The Skylake 14x increase was
specifically because Intel found that the short pause was causing too
much power waste and memory bus contention in spin loops.
### `_tpause`
`_tpause(0x0, __rdtsc() + 1000)` waits for a fixed number of TSC ticks.
TSC frequency is typically fixed at the processor's base frequency (not
turbo), so:
- **3 GHz base:** 1000 ticks ≈ 333ns per iteration → 1M iterations ≈
**333ms**
- **2 GHz base:** 1000 ticks ≈ 500ns per iteration → 1M iterations ≈
**500ms**
The per-iteration time is more predictable than `_mm_pause` (TSC is
constant-rate on modern CPUs), but still scales with TSC frequency. The
total spin is much longer because each iteration is ~333ns vs ~28–47ns
for `_mm_pause` on Skylake+.
### Profiler visibility
Both `_tpause` and `_mm_pause` are treated as **CPU busy** in Task
Manager and ETW sampling profilers, even though these are low-power CPU
states. This ends up looking like Edge consuming all the CPU during
speech recognition.
## Solution
This PR makes the thread pool spin behavior configurable while
**preserving the default (original) behavior** for backward
compatibility:
- **Default (`-1`)**: Uses the original iteration-count-based spin loop
(1M iterations). Unchanged throughput characteristics.
- **`0`**: Disables spinning entirely (threads block immediately).
- **`> 0`**: Enables time-based spinning for the specified duration in
microseconds using `std::chrono::steady_clock`. Recommended for
power-sensitive workloads.
### Session option usage
```cpp
// Use time-based spinning with 1ms duration (recommended for on-device/client workloads)
session_options.AddConfigEntry("session.intra_op.spin_duration_us", "1000");
// Disable spinning entirely
session_options.AddConfigEntry("session.intra_op.spin_duration_us", "0");
```
Both intra-op and inter-op thread pools are independently configurable
via `session.intra_op.spin_duration_us` and
`session.inter_op.spin_duration_us`.
## Changes
### Core thread pool (EigenNonBlockingThreadPool.h)
- `WorkerLoop` now has two spin paths selected by `spin_duration_us_`:
- Negative (default): original iteration-count loop, identical to `main`
- Positive: time-based spin using `steady_clock` with power-of-2 bitmask
optimizations for steal interval and clock-read frequency
- Constructor parameter changed from `bool allow_spinning` → `int
spin_duration_us`
- `ComputeTimeCheckMask()`: dynamically computes clock-read frequency
based on spin duration (clamped to [128, 4096] iterations) to keep
overhead under 1%
### Configuration plumbing
- New session config keys: `session.intra_op.spin_duration_us`,
`session.inter_op.spin_duration_us`
- `OrtThreadPoolParams.spin_duration_us` field with sentinel default
`-1`
- `ParseSpinDurationUs()` helper using `TryParseStringWithClassicLocale`
for safe parsing
- `allow_spinning` and `spin_duration_us` merged at
`CreateThreadPoolHelper`: when `allow_spinning=false`, spin duration is
forced to `0`
### Test updates
- All 8 internal call sites passing `bool true` updated to
`concurrency::kSpinDurationDefault` to avoid silent implicit bool-to-int
conversion
- `onnxruntime_perf_test` supports `--spin_duration_us` CLI flag
- Thread pool benchmarks use `kSpinDurationDefault`
## Key design decisions
1. **Default preserves original behavior**: No performance regression
for existing users. Benchmarks confirmed the iteration-count path
matches `main`.
2. **`steady_clock` over `high_resolution_clock`**: Monotonic guarantee
prevents spin-deadline issues from clock jumps.
3. **`unsigned int` loop counter**: Prevents signed overflow in the
unbounded time-based spin loop.
4. **Power-of-2 bitmask optimization**: Steal every 128 iterations (`&
0x7F`), clock checks at a separate frequency computed from spin duration
— avoids modulo operations in the hot loop.
# Results
<img width="3838" height="1478" alt="image"
src="https://github.com/user-attachments/assets/265a0af0-4ed7-46ae-8263-96553bb592b2"
/>
LHS shows the problem where 85% of CPU time is spent in SpinWait.
RHS shows the same trace with the fix, 50% lower CPU utilization the
length of the usage spikes drop from 527ms to 130ms.
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>1 parent 4dd5d36 commit 3f74b3c
File tree
15 files changed
+256
-34
lines changed- include/onnxruntime/core
- common
- platform
- session
- onnxruntime
- core
- common
- session
- util
- test
- onnx/microbenchmark
- perftest
- platform
- providers
15 files changed
+256
-34
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
12 | 18 | | |
13 | 19 | | |
Lines changed: 36 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
864 | 865 | | |
865 | 866 | | |
866 | 867 | | |
867 | | - | |
868 | | - | |
| 868 | + | |
| 869 | + | |
869 | 870 | | |
870 | 871 | | |
871 | 872 | | |
872 | | - | |
| 873 | + | |
| 874 | + | |
873 | 875 | | |
874 | 876 | | |
875 | 877 | | |
| |||
1598 | 1600 | | |
1599 | 1601 | | |
1600 | 1602 | | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
| 1618 | + | |
| 1619 | + | |
| 1620 | + | |
| 1621 | + | |
| 1622 | + | |
1601 | 1623 | | |
1602 | 1624 | | |
1603 | | - | |
| 1625 | + | |
| 1626 | + | |
1604 | 1627 | | |
1605 | 1628 | | |
1606 | 1629 | | |
| |||
1642 | 1665 | | |
1643 | 1666 | | |
1644 | 1667 | | |
1645 | | - | |
1646 | | - | |
1647 | | - | |
1648 | | - | |
1649 | 1668 | | |
1650 | 1669 | | |
1651 | 1670 | | |
1652 | 1671 | | |
1653 | 1672 | | |
1654 | 1673 | | |
1655 | | - | |
1656 | | - | |
1657 | | - | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
1658 | 1682 | | |
| 1683 | + | |
1659 | 1684 | | |
1660 | 1685 | | |
1661 | 1686 | | |
1662 | 1687 | | |
1663 | | - | |
1664 | 1688 | | |
1665 | 1689 | | |
1666 | 1690 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
132 | 137 | | |
133 | 138 | | |
134 | 139 | | |
| |||
145 | 150 | | |
146 | 151 | | |
147 | 152 | | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
153 | 165 | | |
154 | 166 | | |
155 | 167 | | |
156 | 168 | | |
157 | 169 | | |
158 | 170 | | |
159 | | - | |
| 171 | + | |
160 | 172 | | |
161 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
162 | 186 | | |
163 | 187 | | |
164 | 188 | | |
| |||
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
157 | 173 | | |
158 | 174 | | |
159 | 175 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
6 | 11 | | |
7 | 12 | | |
8 | 13 | | |
| |||
39 | 44 | | |
40 | 45 | | |
41 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
42 | 60 | | |
43 | 61 | | |
44 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
45 | 102 | | |
46 | 103 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
374 | 374 | | |
375 | 375 | | |
376 | 376 | | |
377 | | - | |
| 377 | + | |
378 | 378 | | |
379 | 379 | | |
380 | 380 | | |
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
399 | | - | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
400 | 402 | | |
401 | 403 | | |
402 | 404 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
106 | 129 | | |
107 | 130 | | |
108 | 131 | | |
| |||
455 | 478 | | |
456 | 479 | | |
457 | 480 | | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
458 | 484 | | |
459 | 485 | | |
460 | 486 | | |
| |||
502 | 528 | | |
503 | 529 | | |
504 | 530 | | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
505 | 534 | | |
506 | 535 | | |
507 | 536 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
162 | 163 | | |
163 | 164 | | |
164 | 165 | | |
165 | | - | |
166 | | - | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
167 | 169 | | |
168 | 170 | | |
169 | 171 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
30 | 38 | | |
31 | 39 | | |
32 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | | - | |
| 101 | + | |
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| |||
0 commit comments