fix: flip attention marker to avoid nested-emphasis fusion by ChristianMurphy · Pull Request #74 · syntax-tree/mdast-util-to-markdown

ChristianMurphy · 2026-04-23T23:05:14Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and discussions and couldn’t find anything or linked relevant results below
I made sure the docs are up to date
I included tests (or that’s not needed)

Description of changes

Problem: emphasis uses the configured marker unconditionally, so
emphasis > emphasis > text serializes as **a**, which re-parses
as strong > text. The same path erases the outer type from
strong > emphasis > text (***a*** re-parses as emphasis > strong)
and collapses any strict three-or-more nested emphasis chain
(***a*** re-parses as emphasis > strong rather than three
nested emphases).

#12 catalogued a broader family of related shapes:
***emphasis*in emphasis*, *a*_b__, a ***b*c d*, and variants
with different leading run lengths. As wooorm noted in the thread,
"pulling a thread somewhere will have something happen somewhere
entirely different": marker choice in one place interacts with
flanking rules elsewhere, and covering every shape needs
coordination beyond picking a single marker per node.

Scope: land the minimal change that fixes the shapes guaranteed to
drift under the current serializer, without regressing shapes where
CommonMark's attention algorithm already recovers the original tree
through fusion. Escape-based work on the remaining #12 shapes is
left for follow-up.

Approach: introduce lib/util/emphasis-marker.js. Both the
emphasis handler and its peek route marker selection through it,
keeping lookahead in container-phrasing consistent with what the
handler emits. The helper flips in two narrow situations:

The emphasis is the only child of an attention parent (emphasis
or strong), and both its opening and closing markers would sit
immediately next to the parent's primary marker. Using the
opposite marker (*_a_*, **_a_**) breaks the fusion into
strong or em+strong.
The emphasis sits at the top of a strict same-type chain of
depth two or more (every link has exactly one emphasis child)
when the primary marker is *. Three-deep emphasis only
round-trips with _ on the outside, because _'s flanking
rules are stricter than *'s. The rule is asymmetric on
purpose: with primary _ the first rule's adjacency flip
alternates correctly on its own.

Strong is never flipped. A run of four asterisks already pairs as
two strong delimiters, six as three, and so on, so strong
round-trips without help.

Journey (what was tried and why the scope narrowed):

An earlier iteration flipped strong too and regressed ~18 corpus
fixtures whose nested-strong shapes relied on long fused runs of
asterisks. Strong was dropped from the flip to recover them.
Flipping whenever info.before or info.after matches the
primary caused cascading flips on paragraph-level attention
siblings: [emphasis, strong, emphasis] serialized as
_a___a__*a*, where _ + __ at the em/strong boundary
re-tokenised as a single ___ run. The flip was narrowed to
attention parents only.
Widening rule 1 to first-or-last-child of any attention parent
fixed several roundtripping emphasis in emphasis edge case changes document structure #12 shapes (***emphasis*in emphasis*,
***x*y z*, ****x*y z*) but regressed ***a*a*-*
(emphasis > [emphasis > [emphasis, text], text]): CommonMark's
rule 17 uses the leading *** fusion to recover the three-deep
structure, and the flip broke that recovery. The rule was
tightened to only-child plus strict-chain.
The only-child formulation plus the strict-chain rule is the
widest version verified to cause zero transitions from ok to
finding across 600 corpus files (commonmark, gfm, all
configurations).

Edge cases covered by new tests:

Plain emphasis and strong, with primaries * and _, showing
the helper is inert on non-nested attention.
emphasis > emphasis with each of primary * and _, yielding
*_a_* and _*a*_.
strong > emphasis yielding **_a_**.
emphasis > strong and strong > strong preserved at ***a***
and ****a****, proving strong is untouched.
Strict three-deep emphasis chains with both primaries, both
yielding _*_a_*_ (chain flip vs adjacency flip arrive at the
same output by different routes).
Emphasis parents with more than one child, demonstrating the
only-child guard preserves shapes vanilla handles.
Middle-sibling emphasis, confirming no flip at non-boundary
positions.
Top-level [emphasis, strong, emphasis] round-trip preserved.
***a*a*-* fusion shape explicitly preserved as a regression
guard against future widening.
Round-trips for parsed *_a_*, parsed _*_a_*_, synthesised
emphasis > emphasis, synthesised strong > emphasis, and a
three-deep chain preceded by a text sibling.

Refs: #12

Disclosure: written with AI assistance

Copilot

Pull request overview

Fixes markdown round-tripping for nested attention (emphasis/strong) by choosing markers that avoid delimiter fusion and CommonMark re-tokenization, addressing the edge case described in #12.

Changes:

Added a new utility to pick * vs _ per emphasis/strong node based on surrounding adjacency and a three-deep boundary-chain rule.
Updated emphasis and strong handlers to use the new marker-selection utility.
Added tests for nested attention marker flipping and for some round-trip scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`lib/util/emphasis-strong-marker.js`	New marker-selection logic to avoid attention delimiter fusion and handle three-deep same-type boundary chains.
`lib/handle/emphasis.js`	Uses the new utility to choose the emphasis marker at serialization time.
`lib/handle/strong.js`	Uses the new utility to choose the strong marker at serialization time.
`test/index.js`	Adds tests covering nested attention marker flipping and round-trip behavior for selected cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Problem: emphasis uses the configured marker unconditionally, so `emphasis > emphasis > text` serializes as `**a**`, which re-parses as `strong > text`. The same path erases the outer type from `strong > emphasis > text` (`***a***` re-parses as `emphasis > strong`) and collapses any strict three-or-more nested emphasis chain (`***a***` re-parses as `emphasis > strong` rather than three nested emphases). GH-12 catalogued a broader family of related shapes: `***emphasis*in emphasis*`, `*a*_b__`, `a ***b*c d*`, and variants with different leading run lengths. As wooorm noted in the thread, "pulling a thread somewhere will have something happen somewhere entirely different": marker choice in one place interacts with flanking rules elsewhere, and covering every shape needs coordination beyond picking a single marker per node. Scope: land the minimal change that fixes the shapes guaranteed to drift under the current serializer, without regressing shapes where CommonMark's attention algorithm already recovers the original tree through fusion. Escape-based work on the remaining GH-12 shapes is left for follow-up. Approach: introduce `lib/util/emphasis-marker.js`. Both the emphasis handler and its peek route marker selection through it, keeping lookahead in `container-phrasing` consistent with what the handler emits. The helper flips in two narrow situations: 1. The emphasis is the only child of an attention parent (emphasis or strong), and both its opening and closing markers would sit immediately next to the parent's primary marker. Using the opposite marker (`*_a_*`, `**_a_**`) breaks the fusion into strong or em+strong. 2. The emphasis sits at the top of a strict same-type chain of depth two or more (every link has exactly one emphasis child) when the primary marker is `*`. Three-deep emphasis only round-trips with `_` on the outside, because `_`'s flanking rules are stricter than `*`'s. The rule is asymmetric on purpose: with primary `_` the first rule's adjacency flip alternates correctly on its own. Strong is never flipped. A run of four asterisks already pairs as two strong delimiters, six as three, and so on, so strong round-trips without help. Journey (what was tried and why the scope narrowed): - An earlier iteration flipped strong too and regressed ~18 corpus fixtures whose nested-strong shapes relied on long fused runs of asterisks. Strong was dropped from the flip to recover them. - Flipping whenever `info.before` or `info.after` matches the primary caused cascading flips on paragraph-level attention siblings: `[emphasis, strong, emphasis]` serialized as `_a___a__*a*`, where `_` + `__` at the em/strong boundary re-tokenised as a single `___` run. The flip was narrowed to attention parents only. - Widening rule 1 to first-or-last-child of any attention parent fixed several GH-12 shapes (`***emphasis*in emphasis*`, `***x*y z*`, `****x*y z*`) but regressed `***a*a*-*` (`emphasis > [emphasis > [emphasis, text], text]`): CommonMark's rule 17 uses the leading `***` fusion to recover the three-deep structure, and the flip broke that recovery. The rule was tightened to only-child plus strict-chain. - The only-child formulation plus the strict-chain rule is the widest version verified to cause zero transitions from ok to finding across 600 corpus files (commonmark, gfm, all configurations). Edge cases covered by new tests: - Plain emphasis and strong, with primaries `*` and `_`, showing the helper is inert on non-nested attention. - `emphasis > emphasis` with each of primary `*` and `_`, yielding `*_a_*` and `_*a*_`. - `strong > emphasis` yielding `**_a_**`. - `emphasis > strong` and `strong > strong` preserved at `***a***` and `****a****`, proving strong is untouched. - Strict three-deep emphasis chains with both primaries, both yielding `_*_a_*_` (chain flip vs adjacency flip arrive at the same output by different routes). - Emphasis parents with more than one child, demonstrating the only-child guard preserves shapes vanilla handles. - Middle-sibling emphasis, confirming no flip at non-boundary positions. - Top-level `[emphasis, strong, emphasis]` round-trip preserved. - `***a*a*-*` fusion shape explicitly preserved as a regression guard against future widening. - Round-trips for parsed `*_a_*`, parsed `_*_a_*_`, synthesised `emphasis > emphasis`, synthesised `strong > emphasis`, and a three-deep chain preceded by a text sibling. Refs: #12

Copilot

Pull request overview

This PR adjusts how emphasis delimiters are selected during serialization to prevent CommonMark attention delimiter fusion from changing the parsed AST shape (notably for nested emphasis), and adds regression tests for the affected roundtrip cases.

Changes:

Add lib/util/emphasis-marker.js to centralize emphasis marker selection and selectively flip markers in specific nesting scenarios.
Update the emphasis handler (and its peek) to use the new marker-selection helper.
Expand the emphasis test suite with targeted cases and roundtrip guards for nested attention edge cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
test/index.js	Adds new serialization and roundtrip tests covering nested emphasis/strong and known fusion edge cases.
lib/util/emphasis-marker.js	Introduces helper to choose `*` vs `_` for emphasis to avoid delimiter fusion changing parse results.
lib/handle/emphasis.js	Routes emphasis delimiter selection through the new helper for both handler and peek behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ChristianMurphy · 2026-04-24T17:09:10Z

+  if (primary === '*' && strictChainDepth(node) >= 2) return other
+
+  return primary
+}
+
+/**


observation is correct: strictChainDepth(node) >= 2 does match multiple nodes in a strict chain of depth 4 or deeper, which I'll grant wasn't by explicit design. But applying the proposed gating is a regression, not an improvement. Comparison across depths:

depth current output current reparses as copilot output copilot reparses as

2 *_a_* em > em > text("a") ✅ *_a_* em > em > text("a") ✅

3 _*_a_*_ em > em > em > text("a") ✅ _*_a_*_ same ✅

4 __*_a_*__ strong > em > em > text("a") _*_*a*_*_ [em(text("*")), em("a"), em(text("*"))]

5 ___*_a_*___ em > strong > em > em > text("a") _*_*_a_*_*_ 4 flat em with literal * text

6 ____*_a_*____ strong > strong > em > em > text("a") _*_*_*a*_*_*_ 4 flat em with literal * text

Both approaches drift at depth 4+, CommonMark has no round-tripping representation for a strict 4-deep emphasis chain. But the current output is strictly more faithful to the input tree:

The text content ("a") is preserved.

Attention nesting is preserved (the inner em > em survives).

Only the outermost element type changes (em → strong).

The proposed fix:

Flattens the nesting entirely into sibling em nodes.

Injects literal * or _ characters into text nodes that weren't in the original tree (content corruption).

So the __...__ at the outer boundary looks alarming, but it's producing the least-bad degradation at depth 4+. I'd rather keep the accidental over-match and document it than trade it for a drift that corrupts content.

Murderlon

Looks high-quality 👍 . Double checked with Devin review as well and no issues found: https://app.devin.ai/review/syntax-tree/mdast-util-to-markdown/pull/74

ChristianMurphy requested a review from Copilot April 23, 2026 23:05

github-actions Bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Apr 23, 2026

Copilot started reviewing on behalf of ChristianMurphy April 23, 2026 23:05 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread lib/handle/emphasis.js Outdated

Comment thread lib/handle/strong.js

Comment thread test/index.js

ChristianMurphy force-pushed the fix/nested-attention-marker-flip branch from 121fc17 to 9f155f4 Compare April 24, 2026 13:54

ChristianMurphy requested a review from Copilot April 24, 2026 14:01

Copilot started reviewing on behalf of ChristianMurphy April 24, 2026 14:01 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Murderlon approved these changes Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: flip attention marker to avoid nested-emphasis fusion#74

fix: flip attention marker to avoid nested-emphasis fusion#74
ChristianMurphy wants to merge 1 commit intomainfrom
fix/nested-attention-marker-flip

ChristianMurphy commented Apr 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

ChristianMurphy Apr 24, 2026

Uh oh!

Murderlon left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

depth	current output	current reparses as	copilot output	copilot reparses as
2	`_a_`	`em > em > text("a")` ✅	`_a_`	`em > em > text("a")` ✅
3	`__a__`	`em > em > em > text("a")` ✅	`__a__`	same ✅
4	`___a___`	`strong > em > em > text("a")`	`__a__`	`[em(text("")), em("a"), em(text(""))]`
5	`____a____`	`em > strong > em > em > text("a")`	`___a___`	4 flat `em` with literal `*` text
6	`_____a_____`	`strong > strong > em > em > text("a")`	`___a___`	4 flat `em` with literal `*` text

Uh oh!

Conversation

ChristianMurphy commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Initial checklist

Description of changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ChristianMurphy Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Murderlon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

ChristianMurphy commented Apr 23, 2026 •

edited

Loading