Fix O(n²) complexity in prepareList by heiskr · Pull Request #50 · syntax-tree/mdast-util-from-markdown

heiskr · 2026-04-09T15:29:38Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and discussions and couldn't find anything or linked relevant results below
I made sure the docs are up to date
I included tests (or that's not needed)

Description of changes

prepareList calls events.splice() twice per list item, making it O(n²). This defers insertions into arrays and applies them in a single backward merge pass, making it O(n). Also tightens the backward line-ending scan to stop at start instead of 0.

Fixes #49

Defer events.splice calls and apply them in a single backward merge pass. Also tighten the backward line-ending scan to stop at the list start. Fixes syntax-tree#49 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Improves list preprocessing performance by eliminating per-item events.splice() calls in prepareList, reducing complexity from O(n²) to O(n) for large lists.

Changes:

Defers listItem enter/exit insertions by collecting insertion positions/events during the walk.
Applies all deferred insertions in a single backward merge pass to avoid repeated array shifting.
Tightens the backward line-ending scan to stop at the current list’s start index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Murderlon

Also no comments from Devin review. LGTM

remcohaszing

I’m all for it if this is indeed more performant and CI passes.

But I would really appreciate a review from either @ChristianMurphy or @wooorm as I believe micromark is more their area of expertise.

ChristianMurphy · 2026-05-02T17:31:36Z

Regression tests are interesting here to see the broader impact, running a few scenarios from the commonmark corpus, plus some larger documents to stress test a bit more.
https://github.com/ChristianMurphy/mdast-util-from-markdown/tree/chore/perf-memory-bench
The regression on commonmark general corpus, and regression on smaller lists, and regression on nested lists worries me a bit.
Summary of results run on my machine explained in plain language by Claude:

TL;DR

The PR delivers exactly the speedup it advertises on its target case, including a 31.9 % wall-clock reduction on a synthetic stand-in for the GitHub Docs GraphQL reference page (442 ms → 301 ms). On large flat lists the speedup is even larger: 38.5 % on 10 000-item lists.

But the rewrite has measurable cost at smaller scales and on shapes the original code handled better. Nested lists regress 5–15 % across every size measured (100, 500, 1 000, 2 500 items), and the all-of-CommonMark-spec concatenation regresses 8.9 %. Memory is essentially flat: the deferred-merge approach does not blow up heap (heap delta geomean across real-docs is 1.001), and peak RSS differences are at the KB level.

The PR is a clear win for the GitHub Docs use case and other large-list scenarios. Whether it is the right trade for the wider corpus depends on how the maintainers weight "rare-but-bad" against "common-and-mildly-slower."

Headline numbers

Geometric mean of pr / baseline per input class. Values < 1.0 mean the PR is faster / uses less. Heap and peak-RSS columns are at the KB granularity at which process.memoryUsage() reports.

class	inputs	time	heap	peak rss
lists	20	0.932	0.967	1.252
pathological	10	0.996	1.002	1.051
real-docs	656	1.004	1.001	1.222
fuzz	100	1.009	1.002	—

Reading: lists are 6.8 % faster on average. Pathological and real-docs are statistically flat. The peak-RSS column has heavy noise — most small inputs report 0 KB peak (parse fits in already-allocated memory), so the geomean is dominated by a handful of larger inputs and is not a reliable signal at this granularity. Heap delta is the cleaner memory metric and shows no movement.

Where the PR wins

These are inputs with baseline time ≥ 10 ms, where measurement noise is small relative to the effect. Sorted by time ratio (best → worst).

input	size	base (ms)	pr (ms)	ratio	comment
`lists/flat-ordered-10000`	158 KB	410.3	252.5	0.615	headline win on the 10 k-item case
`lists/flat-unordered-10000`	119 KB	392.6	253.4	0.645	same shape
`real-docs/gh-docs-reference-6.4k`	413 KB	442.4	301.3	0.681	the GitHub Docs scenario from issue #49
`lists/flat-ordered-5000`	78 KB	174.0	127.7	0.734
`lists/paragraph-per-item-100`	10 KB	6.6	5.2	0.797	rare small-input win
`real-docs/gh-docs-reference-3k`	193 KB	176.8	146.1	0.827
`lists/flat-unordered-5000`	59 KB	150.5	126.3	0.840
`lists/flat-ordered-2500`	38 KB	84.0	77.4	0.921

The shape of the speedup curve matches the algorithmic claim: the ratio approaches 0 as N grows because the original code is O(n²) and the PR is O(n). At N = 10 000, the splice-shift cost dominates everything else parsing does, which is why the saving is so large.

Where the PR regresses

Same filter (baseline ≥ 10 ms), sorted by ratio worst-first.

input	size	base (ms)	pr (ms)	ratio	comment
`lists/nested-unordered-100`	4 KB	13.3	15.4	1.155	nested lists are the consistent regression
`lists/flat-ordered-1000`	14 KB	30.4	34.0	1.119	flat list at 1 k items got slower
`lists/nested-unordered-500`	21 KB	58.4	65.0	1.114
`pathological/attention-runs-100`	40 KB	11.6	12.6	1.089	unrelated code path
`real-docs/commonmark-spec/concat`	16 KB	35.2	38.4	1.089	full spec concatenated
`lists/nested-unordered-2500`	112 KB	323.4	346.9	1.073
`lists/nested-unordered-1000`	43 KB	126.7	133.2	1.052

The pattern across the four nested-list sizes (100 / 500 / 1 000 / 2 500) is the most actionable signal here. Every size regresses, with the ratio holding roughly steady around 1.05–1.15. That means nested lists are not a small-N artifact: the new code is consistently a few percent slower on this shape across all sizes tested.

real-docs/commonmark-spec/concat is also worth attention — it is the closest thing in the corpus to "a real document" rather than a synthetic, and it regresses 8.9 %.

Sub-millisecond noise band (the other 85 regressions)

The pass/fail gate also flagged 85 inputs whose baseline time is below 1 ms — almost entirely individual CommonMark spec examples and tiny fuzz seeds. Distribution of all 92 flagged regressions by baseline time bucket:

baseline time	flagged	reading
< 1 ms	85	timer noise dominates; even with median-of-9 + p95-required-too, +5 % is ~50 µs at this scale
1–10 ms	0	clean band
≥ 10 ms	7	the table above; real findings

In other words, the gate's noise floor is the sub-millisecond range on this hardware, not the algorithm. A single-digit-percent shift on a 0.2 ms parse is one cache miss. I'd recommend filtering the strict gate to inputs with baseline ≥ 1 ms before treating the count of failures as a quality bar.

Memory profile

Heap delta is the trustworthy memory measurement here; peak RSS is sampled at setImmediate cadence and resolves at KB granularity, so any input that fits comfortably in already-allocated memory reports 0 and the ratio is undefined or noisy.

For inputs large enough to actually grow the heap:

input	heap base (KB)	heap pr (KB)	Δ
`lists/flat-ordered-10000`	164 865	136 055	−28.8 MB
`lists/flat-unordered-10000`	135 519	138 150	+2.6 MB
`lists/flat-unordered-5000`	115 637	67 714	−47.9 MB (from latest run; lists-only run was −48 MB too — repeatable)
`real-docs/gh-docs-reference-6.4k`	124 282	124 837	+0.5 MB
`pathological/nested-blockquotes-500`	289 429	289 413	−0.02 MB

The PR is not holding the deferred-insertion arrays as a lasting cost. After GC the resulting parse tree is the same size or smaller; in two of the largest list cases the PR uses less memory than baseline (the deferred-merge version produces less intermediate garbage during parse, which means less max heap usage at GC checkpoints). The "list-class heap geomean = 0.967" headline reflects this.

Peak RSS values for the large list and gh-docs inputs are within ±1 % of baseline, which is below the noise floor of process.memoryUsage().rss sampling.

Methodology

For each (input, impl):

global.gc() (Node started with --expose-gc).
Snapshot heapUsed and rss.
Start a setImmediate-driven peak-RSS sampler.
performance.now() → fromMarkdown(text) → performance.now().
Stop sampler. Snapshot heap and RSS after.

Runs are interleaved: B P B P … B P (11 of each). Per (input, impl) we drop the highest and lowest, then take median + p95 over the remaining 9.

Hard ceilings per run: 30 s wall-clock, 1 GiB heap delta. None hit on this run.

I did not use Benchmark.js, mitata, or tinybench. Those are tuned for sub-millisecond microbenchmarks where measurement overhead dominates; this workload is in the millisecond-to-second range where wall-clock noise is the constraint, and none of them sample memory mid-run or do paired-impl comparison the way this needed.

The bench harness, including reproduction commands, lives at bench/ in the repo. CSV is at bench/out/latest.csv; the auto-generated full summary is at bench/out/latest.md.

Recommendation

The PR is worth landing for the workload it targets. The 32 % saving on the GitHub Docs page shape (issue #49) is the marquee result, and the 38 % saving on 10 k-item lists confirms the asymptotic claim.

Before landing, I'd want either an answer or a measurement on three things:

Why do nested lists regress? The pattern is consistent across four sizes — that smells like algorithmic, not noise. The new backward-merge pass walks events once more than the original splice approach; on inputs where prepareList is invoked many times (each nesting level triggers it) the constant-factor overhead of building two arrays and doing the merge pass might outweigh the splice savings when each call only has a handful of items to insert.
Why does flat-ordered-1000 regress while flat-ordered-2500 already wins? The crossover point matters. If the PR is a net loss below ~1 500 items and a net win above, that's a reasonable trade for most real workloads. If the regression is shape-specific rather than size-specific, that's worth understanding before merge.
Is commonmark-spec/concat representative? That is the closest thing in this corpus to "real markdown" rather than synthetic shapes. Its 8.9 % regression is small but real. It might be that this concatenation has many small lists, in which case it tells the same story as nested-unordered-100 — many prepareList calls each with few items.

A small follow-up — only invoke the new merge-pass branch when insertCount exceeds some threshold (say 4 or 8) and fall back to the original splice loop otherwise — would likely turn every regression here into a tie while keeping the big-N wins. Worth measuring before requesting it on the PR.

</claude>

Problem: prepareList synthesises listItem enter/exit events one at a time via events.splice(at, 0, [event]), and each splice shifts the suffix of the events array. For a list with K items inside an array of N events that is O(K * N) shift work per list — the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown observed at depth on issue syntax-tree#49 / PR syntax-tree#50. Goal: collapse the per-item splices into a single batched rewrite of the list's event range, while preserving the existing tail-walk semantics that determine where each listItem's exit should be inserted. Changes: - dev/lib/index.js: replace the inline events.splice calls in prepareList with an insertions[] queue collected during the walk. After the loop, apply the queued insertions in one pass: - small lists (<= 8 insertions, the common case including deeply nested lists where many tiny ranges would otherwise pay rebuild overhead) splice each insertion in reverse order so unsplice'd positions stay valid; - wide lists go through a batched newSub rebuild and use a chunked spread to avoid V8's argument-count limit when newSub > 5000. Inputs that benefit (multi-run median-of-medians vs baseline; spread in parentheses): - p-wide-list (10000 single-level items): -38.0% (7.9%) - p-many-headings: -18.3% (3.8%) - xs (one CommonMark example): -8.3% (7.5%) - l (~564 KB CommonMark spec * 35): -8.0% (2.6%) - s (full CommonMark spec): -7.2% (11.3%) - m (CommonMark spec * 7): -2.5% (3.2%) - p-deep-list (256 nested levels): -2.4% but spread is 46.5% on this stack; treat as inside its own noise band. Single-run full corpus shows the same direction on every other input that contains at least one list (p-many-fenced-code -25.6%, p-many-images -23.2%, p-many-char-refs -23.9%, p-many-links -26.5%, p-tab-heavy -23.2%, p-html-blocks -17.1%, etc.). Trade-offs / inputs that do not contain lists: - legacy-strong / legacy-strong-emph (raw 'a**b' x 1e4 emphasis patterns) reported +13% / +28% on a single run, but their input contains zero lists so prepareList is never invoked. Cross-run spread on these scenarios is 44-52% on the baseline alone; the +/- numbers here are noise inside that band. - p-long-para, p-unicode-heavy, p-mismatched-emph: also list-free; all three move within +/-3% of baseline (noise). Tests: dev + prod 1448/1448. mdast-util-gfm 54/54. mdast-util-mdx 11/13 — the two failing tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent mdast-util-from-markdown perf branches into one rollup so maintainers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice — batch listItem insertions (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse — single shared dispatch this-binding - perf/stable-node-shape — pre-declare position in every node factory Cumulative impact (multi-run median-of-medians vs the mdast baseline; spread in parentheses): - p-many-char-refs (10k '&' entities): -43.8% (7.5%) - p-wide-list (10k single-level items): -42.9% (11.6% — borderline) - l (~564 KB CommonMark spec * 35): -13.8% (1.0% — very clean) - s (full CommonMark spec): -12.3% (7.9%) - m (CommonMark spec * 7): -8.5% (1.0% — very clean) - legacy-base (40 KB 'xxxx' x 1e4): -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 scenarios in the range -1% to -43%. The largest pathological wins are p-many-char-refs -42.2%, p-many-fenced-code -40.7%, p-wide-list -37.0%, p-tab-heavy -35.6%, p-many-headings -28.2%, p-code-spans -24.4%, p-many-links -22.3%, p-many-images -21.5%, p-html-blocks -15.3%, xs -16.4%. Trade-offs / inputs that do not improve: - p-long-para: +5.3% multi-run (spread 9.4%) and +11.6% single-run. This input is one giant text node; none of the three changes target the single-text-node path. The +5.3% is on the edge of its noise band but is the largest credible regression of the rollup. - p-unicode-heavy: +2.3% multi-run (spread 11.4%, NOISY) — within the scenario's noise band. - p-mismatched-emph: +0.8% multi-run (7.3%) — flat. - legacy-strong / legacy-strong-emph: +44% / +58% single-run, but these scenarios have 12-sample mandatory floor on this stack and cross-run spread of 44-53% on baseline alone; the deltas reported here are inside that band. The input is pure 'a**b' x 1e4 with no lists / few node creations / few mid-event sliceSerialize variants, so none of the three optimisations target what this scenario exercises. Tests: dev + prod 1448/1448. mdast-util-gfm 54/54. mdast-util-mdx 11/13 — the two failing tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesises listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to 8 insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimisations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesises listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to 8 insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. Both thresholds were swept on real inputs rather than picked by feel. SMALL_LIST_LIMIT was tested at {0, 2, 4, 8, 16, 32, 64}: deeper nesting (a 256-level list) preferred 16 to 64 (around 21% faster than 0), but typical documents (CommonMark spec, spec * 7, spec * 35) preferred lower values because the rebuild path's allocation + sort overhead outweighs the saved splice work when lists are 4 to 12 items each. 8 sits at the balance point and keeps the validated multi-run wins on the typical-document inputs. The chunked-spread threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 was the lowest median across the 10000-item single-level list and both spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimisations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesises listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to 8 insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. Both thresholds were swept on real inputs rather than picked by feel. SMALL_LIST_LIMIT was tested at {0, 2, 4, 8, 16, 32, 64}: deeper nesting (a 256-level list) preferred 16 to 64 (around 21% faster than 0), but typical documents (CommonMark spec, spec * 7, spec * 35) preferred lower values because the rebuild path's allocation + sort overhead outweighs the saved splice work when lists are 4 to 12 items each. 8 sits at the balance point and keeps the validated multi-run wins on the typical-document inputs. The chunked-spread threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 was the lowest median across the 10000-item single-level list and both spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimizations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and a chunked spread. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation and sort overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so the rebuild splits the new sub-array into chunks once it exceeds a safe size. The chunk threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper, which tracks the same V8 constraint. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimizations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

ChristianMurphy · 2026-05-03T13:48:25Z

I opened a follow up at #51 that addresses the small list and nested list slowdown, while keeping the speed up for large lists

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and a chunked spread. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation and sort overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so the rebuild splits the new sub-array into chunks once it exceeds a safe size. The chunk threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper, which tracks the same V8 constraint. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimizations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and a chunked spread. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation and sort overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so the rebuild splits the new sub-array into chunks once it exceeds a safe size. The chunk threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper, which tracks the same V8 constraint. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimizations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1450/1450, 100% coverage maintained. mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unsplice'd positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and a chunked spread. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation and sort overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so the rebuild splits the new sub-array into chunks once it exceeds a safe size. The chunk threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper, which tracks the same V8 constraint. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -38.0% (7.9%) 5,000 ATX headings -18.3% (3.8%) one CommonMark example -8.3% (7.5%) CommonMark spec * 35 (~564 KB) -8.0% (2.6%) full CommonMark spec (~16 KB) -7.2% (11.3%) CommonMark spec * 7 (~113 KB) -2.5% (3.2%) 256 nested ordered list levels -2.4% (46.5% spread, treat as flat on this stack) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1448/1448, mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Octopus merge of the three independent perf branches into one rollup so reviewers can evaluate the cumulative impact on a single bench run. Each underlying branch is also pushed on its own and can land independently. Branches merged: - perf/prepare-list-no-splice (Closes syntax-tree#49, Refs syntax-tree#50) - perf/dispatch-context-reuse - perf/stable-node-shape Cumulative impact, multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 character entity references -43.8% (7.5%) 10,000 single-level list items -42.9% (11.6%, borderline) CommonMark spec * 35 (~564 KB) -13.8% (1.0%, very clean) full CommonMark spec (~16 KB) -12.3% (7.9%) CommonMark spec * 7 (~113 KB) -8.5% (1.0%, very clean) 'xxxx' x 10,000 (~40 KB) -4.0% (9.0%) Single-run full corpus shows wins on 20 of 27 inputs, ranging from -1% to -43%. The largest pathological wins are inputs heavy in character entity references (-42.2%), fenced code blocks (-40.7%), single-level list items (-37.0%), tabs (-35.6%), ATX headings (-28.2%), backtick code spans (-24.4%), inline links (-22.3%), inline images (-21.5%), HTML blocks (-15.3%), and one CommonMark example (-16.4%). Trade-offs: A 1 MB single paragraph reported +5.3% multi-run with a 9.4% spread, and +11.6% on a single full-corpus run. None of the three changes target the single-text-node path, so the small regression is the edge of that input's noise band. A 256 KB Unicode-heavy input reported +2.3% multi-run inside its 11.4% spread (treat as flat). A 10,000-unmatched-asterisk input moved +0.8% multi-run (flat). The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +44% and +58% on a single run, but their cross-run spread is 44 to 53% on the baseline alone. The input shape (almost all attentionSequence events that mostly do not match a handler, no lists, no node-creation hot path) means none of the three optimizations can target what these inputs exercise. Treat the deltas as noise. Tests pass: dev + prod 1450/1450, 100% coverage maintained. mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch.

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unspliced positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild and use a chunked spread so the splice never hits V8's argument count limit. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and a chunked spread. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so the rebuild splits the new sub-array into chunks once it exceeds a safe size. The chunk threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper, which tracks the same V8 constraint. Pass-1 records insertions in non-decreasing `at` order by construction (each boundary records its exit insertion at `lineIndex || index` followed by its enter insertion at `index`, and the next boundary's tail walk is bounded by the previous boundary's listItemPrefix), so no sort is needed in the slow path. A dev-only assertion verifies the invariant on every slow-path call. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -36.3% (4.0%) 5,000 ATX headings -22.9% (5.5%) one CommonMark example -13.2% (16.0%) full CommonMark spec (~16 KB) -13.0% (46.4%, NOISY) CommonMark spec * 35 (~564 KB) -10.7% (0.8%, very clean) 256 nested ordered list levels -4.0% (6.7%) CommonMark spec * 7 (~113 KB) -3.8% (12.8%) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported +13% and +28% on a single run, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1450/1450, 100% coverage maintained. mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unspliced positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild that writes the replacement into a fresh array, then either splices the whole replacement in one call (when it fits below V8's spread argument limit) or shifts the suffix once and writes the replacement into the vacated range. The in-place shift avoids the per-chunk splice loop a chunked spread fallback would use, which would re-introduce O(K * N) shift cost on very wide lists. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and an in-place shift. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so above the threshold the rebuild instead resizes events once, shifts the suffix to its target position, and writes the replacement into the vacated range. Total work is O(suffix + replacement.length), independent of how many insertions were queued. The single-spread threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper. Pass-1 records insertions in non-decreasing `at` order by construction (each boundary records its exit insertion at `lineIndex || index` followed by its enter insertion at `index`, and the next boundary's tail walk is bounded by the previous boundary's listItemPrefix), so no sort is needed in the slow path. A dev-only assertion verifies the invariant on every slow-path call. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -40.8% (5.6%) 5,000 ATX headings -20.6% (4.3%) CommonMark spec * 35 (~564 KB) -11.7% (1.6%, very clean) one CommonMark example -9.1% (105%, NOISY) 256 nested ordered list levels -8.8% (20.3%) full CommonMark spec (~16 KB) -7.0% (35.4%, NOISY) CommonMark spec * 7 (~113 KB) -3.6% (1.0%, very clean) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported large deltas in single runs, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1452/1452, 100% coverage maintained. mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

prepareList synthesizes listItem enter and exit events one item at a time using events.splice(at, 0, [event]). Each splice shifts the suffix of the events array, so a list with K items inside an array of N events does O(K * N) shift work. This is the dominant cost in mdast-util-from-markdown's contribution to wide-list inputs and the slowdown reported at depth on issue syntax-tree#49 / PR syntax-tree#50. The fix collects the would-be splices into an insertions queue during the existing walk and applies them outside the loop in one pass. Two paths handle the work efficiently: lists with up to a small number of insertions take a fast path that splices each insertion in reverse order so unspliced positions stay valid, which avoids the cost of allocating a fresh sub-array; longer lists go through a batched rebuild that writes the replacement into a fresh array, then either splices the whole replacement in one call (when it fits below V8's spread argument limit) or shifts the suffix once and writes the replacement into the vacated range. The in-place shift avoids the per-chunk splice loop a chunked spread fallback would use, which would re-introduce O(K * N) shift cost on very wide lists. How the cut points were chosen: There are two thresholds in the new code: SMALL_LIST_LIMIT chooses between fast-path splice loop and rebuild; SAFE_SPREAD chooses between a single spread and an in-place shift. SMALL_LIST_LIMIT is a workload-dependent crossover. The fast path costs O(K * suffix) because each of the K splices shifts the events suffix; the rebuild path costs O(N + K) plus a fixed allocation overhead. Below some K the rebuild's constant overhead dominates; above some K the fast path's K * suffix dominates. Because suffix size and per-insertion splice cost both vary with document shape, no single value is universally best: deeper nesting prefers higher limits (everything stays on the splice loop), and documents with a few moderately-sized lists prefer lower limits (the rebuild's lower per-item cost wins). The threshold was chosen by sweeping {0, 2, 4, 8, 16, 32, 64} with representative inputs from both regimes and picking the value that kept the validated multi-run wins on typical-document inputs without regressing the deep-nest case beyond its own noise band. SAFE_SPREAD is set by V8's argument count limit. Spreading a very large array into events.splice can throw a stack overflow in some V8 versions, so above the threshold the rebuild instead resizes events once, shifts the suffix to its target position, and writes the replacement into the vacated range. Total work is O(suffix + replacement.length), independent of how many insertions were queued. The single-spread threshold was tested at {1000, 2000, 5000, 10000, 20000, 70000}; 10000 had the lowest median across the wide-list and spec-derived inputs and matches the threshold micromark-util-chunked already uses for its own splice helper. Pass-1 records insertions in non-decreasing `at` order by construction (each boundary records its exit insertion at `lineIndex || index` followed by its enter insertion at `index`, and the next boundary's tail walk is bounded by the previous boundary's listItemPrefix), so no sort is needed in the slow path. A dev-only assertion verifies the invariant on every slow-path call. Inputs that benefit, with multi-run median-of-medians vs the baseline (spread in parentheses): 10,000 single-level list items -40.8% (5.6%) 5,000 ATX headings -20.6% (4.3%) CommonMark spec * 35 (~564 KB) -11.7% (1.6%, very clean) one CommonMark example -9.1% (105%, NOISY) 256 nested ordered list levels -8.8% (20.3%) full CommonMark spec (~16 KB) -7.0% (35.4%, NOISY) CommonMark spec * 7 (~113 KB) -3.6% (1.0%, very clean) Single-run full corpus runs show the same direction on every other input that contains at least one list, with wins of -17% to -27% on inputs heavy in fenced code blocks, images, character references, inline links, tabs, and HTML blocks. The largest improvement is the 10,000-item single-level list input, which is the worst case for the old per-item splice loop. Trade-offs and inputs that do not move: Inputs that contain no lists are unaffected by the change because prepareList is never invoked. The pure emphasis stress inputs ('a**b' repeated 10,000 times and similar) reported large deltas in single runs, but those inputs have a cross-run spread of 44 to 52% on the baseline alone, so the apparent regressions sit inside their own noise band. A 1 MB single paragraph, a Unicode-heavy 256 KB input, and 10,000 unmatched asterisks all moved within +/- 3% of baseline. Tests pass: dev + prod 1454/1454, 100% coverage maintained. Three new tests cover the rebuild path: a wide-list parse-and-line spot-check, plus first-item deepEqual against a 4-item fast-path reference for both tight and loose lists (so a bug confined to the rebuild branch diverges from the fast path the reference uses). mdast-util-gfm 54/54, mdast-util-mdx 11/13. The two failing mdx tests reproduce on upstream/main and are not introduced by this branch. Closes syntax-tree#49 Refs syntax-tree#50

Fix O(n²) complexity in prepareList

adecc18

Defer events.splice calls and apply them in a single backward merge pass. Also tighten the backward line-ending scan to stop at the list start. Fixes syntax-tree#49 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added the 👋 phase/new Post is being triaged automatically label Apr 9, 2026

This comment has been minimized.

Sign in to view

github-actions Bot added 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Apr 9, 2026

ChristianMurphy added the 🏁 area/perf This affects performance label Apr 9, 2026

ChristianMurphy requested review from Copilot and remcohaszing April 9, 2026 16:12

Copilot started reviewing on behalf of ChristianMurphy April 9, 2026 16:13 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Murderlon approved these changes Apr 15, 2026

View reviewed changes

remcohaszing approved these changes Apr 30, 2026

View reviewed changes

remcohaszing requested review from ChristianMurphy and wooorm April 30, 2026 13:34

ChristianMurphy mentioned this pull request May 3, 2026

perf(prepare-list): batch list-item insertions into one pass #51

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix O(n²) complexity in prepareList#50

Fix O(n²) complexity in prepareList#50
heiskr wants to merge 1 commit intosyntax-tree:mainfrom
heiskr:fix/prepare-list-quadratic-complexity

heiskr commented Apr 9, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Murderlon left a comment

Uh oh!

remcohaszing left a comment

Uh oh!

ChristianMurphy commented May 2, 2026

Uh oh!

ChristianMurphy commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

heiskr commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Initial checklist

Description of changes

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Murderlon left a comment

Choose a reason for hiding this comment

Uh oh!

remcohaszing left a comment

Choose a reason for hiding this comment

Uh oh!

ChristianMurphy commented May 2, 2026

TL;DR

Headline numbers

Where the PR wins

Where the PR regresses

Sub-millisecond noise band (the other 85 regressions)

Memory profile

Methodology

Recommendation

Uh oh!

ChristianMurphy commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

heiskr commented Apr 9, 2026 •

edited

Loading