perf(prepare-list): batch list-item insertions into one pass#51
perf(prepare-list): batch list-item insertions into one pass#51ChristianMurphy wants to merge 1 commit intosyntax-tree:mainfrom
Conversation
fe0eff3 to
c4361a5
Compare
c4361a5 to
06efef9
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors prepareList in the markdown compiler to avoid repeated in-loop events.splice() calls when synthesizing listItem enter/exit events. In the overall codebase, this targets a core parser hot path so wide or deeply nested list inputs can be processed more efficiently.
Changes:
- Queue synthetic
listItementer/exit insertions during the existing list walk instead of mutatingeventsimmediately. - Add two post-processing insertion strategies: a small-list reverse-splice fast path and a wide-list batched rebuild path.
- Add chunked
splicehandling for very large replacement ranges to stay under V8 spread/argument limits.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6d428ec to
0420c86
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0420c86 to
1b8fab0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unspliced positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild that writes the replacement into a fresh array, then either splices the whole replacement in one call (when it fits below V8's spread argument limit) or shifts the suffix once and writes the replacement into the vacated range. The in-place shift avoids the per-chunk splice loop a chunked spread fallback would use, which would re-introduce O(K * N) shift cost on very wide lists. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and an in-place shift. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so above the threshold the rebuild instead resizes events once, shifts the suffix to its target position, and writes the replacement into the vacated range. Total work is O(suffix + replacement.length), independent of how many insertions were queued. The single-spread threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper. Pass-1 records insertions in non-decreasing `at` order by construction (each boundary records its exit insertion at `lineIndex || index` followed by its enter insertion at `index`, and the next boundary's tail walk is bounded by the previous boundary's listItemPrefix), so no sort is needed in the slow path. A dev-only assertion verifies the invariant on every slow-path call. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -40.8% (5.6%) 5,000 ATX headings -20.6% (4.3%) CommonMark spec * 35 (~564 KB) -11.7% (1.6%, very clean) one CommonMark example -9.1% (105%, NOISY) 256 nested ordered list levels -8.8% (20.3%) full CommonMark spec (~16 KB) -7.0% (35.4%, NOISY) CommonMark spec * 7 (~113 KB) -3.6% (1.0%, very clean) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported large deltas in single runs, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1454/1454, 100% coverage maintained. Three new tests cover the rebuild path: a wide-list parse-and-line spot-check, plus first-item deepEqual against a 4-item fast-path reference for both tight and loose lists (so a bug confined to the rebuild branch diverges from the fast path the reference uses). mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50
1b8fab0 to
ec811ad
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const replacement = Array.from({ | ||
| length: rangeLength + insertions.length | ||
| }) |
There was a problem hiding this comment.
This is a conflict with the lint rule.
xo doesn't allow new Array though it as faster.
Maybe worth one disable for a perf boost?
There was a problem hiding this comment.
Definitely, the linter serves us, not the either way around
| // SMALL_LIST_LIMIT, so prepareList rebuilds the events array. Spot- | ||
| // check the count and the last item's start.line so a misalignment | ||
| // bug in the suffix shift cannot pass on count alone. | ||
| const wideCount = 1000 | ||
| const tree = fromMarkdown('- a\n'.repeat(wideCount)) | ||
| if (tree.children[0].type !== 'list') throw new Error('expected list') | ||
| assert.equal(tree.children[0].children.length, wideCount) | ||
|
|
||
| const lastItemPosition = tree.children[0].children[wideCount - 1].position | ||
| if (!lastItemPosition) throw new Error('expected position') | ||
| assert.equal(lastItemPosition.start.line, wideCount) |
There was a problem hiding this comment.
I'm less worried about testing this, but happy to add a case if folks feel it's worth while
Initial checklist
Description of changes
prepareList synthesizes listItem enter and exit events one item at a
time using events.splice(at, 0, [event]). Each splice shifts the
suffix of the events array, so a list with K items inside an array of
N events does O(K * N) shift work. This is the dominant cost in
mdast-util-from-markdown's contribution to wide-list inputs and the
slowdown reported at depth on issue #49 / PR #50.
The fix collects the would-be splices into an insertions queue during
the existing walk and applies them outside the loop in one pass. Two
paths handle the work efficiently: lists with up to a small number of
insertions take a fast path that splices each insertion in reverse
order so unspliced positions stay valid, which avoids the cost of
allocating a fresh sub-array; longer lists go through a batched
rebuild that writes the replacement into a fresh array, then either
splices the whole replacement in one call (when it fits below V8's
spread argument limit) or shifts the suffix once and writes the
replacement into the vacated range. The in-place shift avoids the
per-chunk splice loop a chunked spread fallback would use, which
would re-introduce O(K * N) shift cost on very wide lists.
How the cut points were chosen:
There are two thresholds in the new code: SMALL_LIST_LIMIT chooses
between fast-path splice loop and rebuild; SAFE_SPREAD chooses
between a single spread and an in-place shift.
SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path
costs O(K * suffix) because each of the K splices shifts the events
suffix; the rebuild path costs O(N + K) plus a fixed allocation
overhead. Below some K the rebuild's constant overhead dominates;
above some K the fast path's K * suffix dominates. Because suffix
size and per-insertion splice cost both vary with document shape,
no single value is universally best: deeper nesting prefers higher
limits (everything stays on the splice loop), and documents with a
few moderately-sized lists prefer lower limits (the rebuild's lower
per-item cost wins). The threshold was chosen by sweeping
{0, 2, 4, 8, 16, 32, 64} with representative inputs from both
regimes and picking the value that kept the validated multi-run wins
on typical-document inputs without regressing the deep-nest case
beyond its own noise band.
SAFE_SPREAD is set by V8's argument count limit. Spreading a very
large array into events.splice can throw a stack overflow in some
V8 versions, so above the threshold the rebuild instead resizes
events once, shifts the suffix to its target position, and writes
the replacement into the vacated range. Total work is O(suffix +
replacement.length), independent of how many insertions were
queued. The single-spread threshold was tested at
{1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median
across the wide-list and spec-derived inputs and matches the
threshold micromark-util-chunked already uses for its own splice
helper.
Pass-1 records insertions in non-decreasing
atorder byconstruction (each boundary records its exit insertion at
lineIndex || indexfollowed by its enter insertion atindex,and the next boundary's tail walk is bounded by the previous
boundary's listItemPrefix), so no sort is needed in the slow path.
A dev-only assertion verifies the invariant on every slow-path
call.
Inputs that benefit, with multi-run median-of-medians vs the baseline
(spread in parentheses):
10,000 single-level list items -40.8% (5.6%)
5,000 ATX headings -20.6% (4.3%)
CommonMark spec * 35 (~564 KB) -11.7% (1.6%, very clean)
one CommonMark example -9.1% (105%, NOISY)
256 nested ordered list levels -8.8% (20.3%)
full CommonMark spec (~16 KB) -7.0% (35.4%, NOISY)
CommonMark spec * 7 (~113 KB) -3.6% (1.0%, very clean)
Single-run full corpus runs show the same direction on every other
input that contains at least one list, with wins of -17% to -27% on
inputs heavy in fenced code blocks, images, character references,
inline links, tabs, and HTML blocks. The largest improvement is the
10,000-item single-level list input, which is the worst case for the
old per-item splice loop.
Trade-offs and inputs that do not move:
Inputs that contain no lists are unaffected by the change because
prepareList is never invoked. The pure emphasis stress inputs ('a**b'
repeated 10,000 times and similar) reported large deltas in single
runs, but those inputs have a cross-run spread of 44 to 52% on the
baseline alone, so the apparent regressions sit inside their own
noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input,
and 10,000 unmatched asterisks all moved within +/- 3% of baseline.
Tests pass: dev + prod 1452/1452, 100% coverage maintained.
mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests
reproduce on upstream/main and are not introduced by this branch.
Closes #49
Refs #50