Skip to content

Commit 9f155f4

Browse files
author
Chad Murphy
committed
fix(emphasis): flip markers for nested emphasis round-trip (GH-12)
Problem: emphasis uses the configured marker unconditionally, so `emphasis > emphasis > text` serializes as `**a**`, which re-parses as `strong > text`. The same path erases the outer type from `strong > emphasis > text` (`***a***` re-parses as `emphasis > strong`) and collapses any strict three-or-more nested emphasis chain (`***a***` re-parses as `emphasis > strong` rather than three nested emphases). GH-12 catalogued a broader family of related shapes: `***emphasis*in emphasis*`, `*a*_b__`, `a ***b*c d*`, and variants with different leading run lengths. As wooorm noted in the thread, "pulling a thread somewhere will have something happen somewhere entirely different": marker choice in one place interacts with flanking rules elsewhere, and covering every shape needs coordination beyond picking a single marker per node. Scope: land the minimal change that fixes the shapes guaranteed to drift under the current serializer, without regressing shapes where CommonMark's attention algorithm already recovers the original tree through fusion. Escape-based work on the remaining GH-12 shapes is left for follow-up. Approach: introduce `lib/util/emphasis-marker.js`. Both the emphasis handler and its peek route marker selection through it, keeping lookahead in `container-phrasing` consistent with what the handler emits. The helper flips in two narrow situations: 1. The emphasis is the only child of an attention parent (emphasis or strong), and both its opening and closing markers would sit immediately next to the parent's primary marker. Using the opposite marker (`*_a_*`, `**_a_**`) breaks the fusion into strong or em+strong. 2. The emphasis sits at the top of a strict same-type chain of depth two or more (every link has exactly one emphasis child) when the primary marker is `*`. Three-deep emphasis only round-trips with `_` on the outside, because `_`'s flanking rules are stricter than `*`'s. The rule is asymmetric on purpose: with primary `_` the first rule's adjacency flip alternates correctly on its own. Strong is never flipped. A run of four asterisks already pairs as two strong delimiters, six as three, and so on, so strong round-trips without help. Journey (what was tried and why the scope narrowed): - An earlier iteration flipped strong too and regressed ~18 corpus fixtures whose nested-strong shapes relied on long fused runs of asterisks. Strong was dropped from the flip to recover them. - Flipping whenever `info.before` or `info.after` matches the primary caused cascading flips on paragraph-level attention siblings: `[emphasis, strong, emphasis]` serialized as `_a___a__*a*`, where `_` + `__` at the em/strong boundary re-tokenised as a single `___` run. The flip was narrowed to attention parents only. - Widening rule 1 to first-or-last-child of any attention parent fixed several GH-12 shapes (`***emphasis*in emphasis*`, `***x*y z*`, `****x*y z*`) but regressed `***a*a*-*` (`emphasis > [emphasis > [emphasis, text], text]`): CommonMark's rule 17 uses the leading `***` fusion to recover the three-deep structure, and the flip broke that recovery. The rule was tightened to only-child plus strict-chain. - The only-child formulation plus the strict-chain rule is the widest version verified to cause zero transitions from ok to finding across 600 corpus files (commonmark, gfm, all configurations). Edge cases covered by new tests: - Plain emphasis and strong, with primaries `*` and `_`, showing the helper is inert on non-nested attention. - `emphasis > emphasis` with each of primary `*` and `_`, yielding `*_a_*` and `_*a*_`. - `strong > emphasis` yielding `**_a_**`. - `emphasis > strong` and `strong > strong` preserved at `***a***` and `****a****`, proving strong is untouched. - Strict three-deep emphasis chains with both primaries, both yielding `_*_a_*_` (chain flip vs adjacency flip arrive at the same output by different routes). - Emphasis parents with more than one child, demonstrating the only-child guard preserves shapes vanilla handles. - Middle-sibling emphasis, confirming no flip at non-boundary positions. - Top-level `[emphasis, strong, emphasis]` round-trip preserved. - `***a*a*-*` fusion shape explicitly preserved as a regression guard against future widening. - Round-trips for parsed `*_a_*`, parsed `_*_a_*_`, synthesised `emphasis > emphasis`, synthesised `strong > emphasis`, and a three-deep chain preceded by a text sibling. Refs: #12
1 parent ee3b345 commit 9f155f4

3 files changed

Lines changed: 385 additions & 8 deletions

File tree

lib/handle/emphasis.js

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@
33
* @import {Emphasis, Parents} from 'mdast'
44
*/
55

6-
import {checkEmphasis} from '../util/check-emphasis.js'
6+
import {emphasisMarker} from '../util/emphasis-marker.js'
77
import {encodeCharacterReference} from '../util/encode-character-reference.js'
88
import {encodeInfo} from '../util/encode-info.js'
99

1010
emphasis.peek = emphasisPeek
1111

1212
/**
1313
* @param {Emphasis} node
14-
* @param {Parents | undefined} _
14+
* @param {Parents | undefined} parent
1515
* @param {State} state
1616
* @param {Info} info
1717
* @returns {string}
1818
*/
19-
export function emphasis(node, _, state, info) {
20-
const marker = checkEmphasis(state)
19+
export function emphasis(node, parent, state, info) {
20+
const marker = emphasisMarker(node, parent, state, info)
2121
const exit = state.enter('emphasis')
2222
const tracker = state.createTracker(info)
2323
const before = tracker.move(marker)
@@ -59,11 +59,11 @@ export function emphasis(node, _, state, info) {
5959
}
6060

6161
/**
62-
* @param {Emphasis} _
63-
* @param {Parents | undefined} _1
62+
* @param {Emphasis} node
63+
* @param {Parents | undefined} parent
6464
* @param {State} state
6565
* @returns {string}
6666
*/
67-
function emphasisPeek(_, _1, state) {
68-
return state.options.emphasis || '*'
67+
function emphasisPeek(node, parent, state) {
68+
return emphasisMarker(node, parent, state, {before: '', after: ''})
6969
}

lib/util/emphasis-marker.js

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
/**
2+
* @import {Emphasis, Parents} from 'mdast'
3+
* @import {State} from 'mdast-util-to-markdown'
4+
*/
5+
6+
import {checkEmphasis} from './check-emphasis.js'
7+
8+
/**
9+
* Pick the marker to use for an emphasis node, flipping from the configured
10+
* marker to its opposite when the configured marker would fuse with an
11+
* adjacent attention delimiter and re-parse as a different construct.
12+
*
13+
* Only emphasis gets the flip. Strong already round-trips through the
14+
* spec's attention algorithm because a run of 4 asterisks pairs as two
15+
* strong delimiters, and a run of 6 as three, and so on. Nested emphasis
16+
* is the asymmetric case: a run of 2 asterisks pairs as one strong, not as
17+
* two nested emphases, so without a flip `emphasis > emphasis > text`
18+
* round-trips as `strong > text`.
19+
*
20+
* Two situations drive a flip, both narrowly scoped to avoid disturbing
21+
* shapes the serializer already handles via fusion:
22+
*
23+
* 1. The emphasis is an only child of an attention parent (emphasis or
24+
* strong), and both its opening and closing markers would be adjacent
25+
* to the parent's primary marker. Using the opposite marker (for
26+
* example, `*_a_*` for `emphasis > emphasis > text` with primary
27+
* `*`) breaks the fusion.
28+
*
29+
* 2. The emphasis sits at the top of a strict same-type chain of depth at
30+
* least 2 (each link has exactly one emphasis child), with primary
31+
* `*`. Three-deep emphasis collapses under rule 17 unless the
32+
* outermost marker is `_`, because `_`'s flanking rules are stricter
33+
* than `*`'s. The check is asymmetric by design: when the configured
34+
* marker is already `_`, the adjacency flip in rule 1 alone is enough.
35+
*
36+
* @param {Emphasis} node
37+
* @param {Parents | undefined} parent
38+
* @param {State} state
39+
* @param {{before: string, after: string}} info
40+
* Only the `before` and `after` fields are read.
41+
* @returns {'*' | '_'}
42+
*/
43+
export function emphasisMarker(node, parent, state, info) {
44+
const primary = checkEmphasis(state)
45+
const other = primary === '*' ? '_' : '*'
46+
47+
if (
48+
parent &&
49+
(parent.type === 'emphasis' || parent.type === 'strong') &&
50+
'children' in parent &&
51+
parent.children.length === 1 &&
52+
info.before.charAt(info.before.length - 1) === primary &&
53+
info.after.charAt(0) === primary
54+
) {
55+
return other
56+
}
57+
58+
if (primary === '*' && strictChainDepth(node) >= 2) return other
59+
60+
return primary
61+
}
62+
63+
/**
64+
* Count the depth of a strict single-child emphasis chain descending from
65+
* `node`. A chain is strict when every link has exactly one child and that
66+
* child is also `emphasis`.
67+
*
68+
* @param {Emphasis} node
69+
* @returns {number}
70+
*/
71+
function strictChainDepth(node) {
72+
const children = node.children
73+
if (!children || children.length !== 1) return 0
74+
const only = children[0]
75+
if (only.type !== 'emphasis') return 0
76+
return 1 + strictChainDepth(only)
77+
}

0 commit comments

Comments
 (0)