Commit 9f155f4
Chad Murphy
fix(emphasis): flip markers for nested emphasis round-trip (GH-12)
Problem: emphasis uses the configured marker unconditionally, so
`emphasis > emphasis > text` serializes as `**a**`, which re-parses
as `strong > text`. The same path erases the outer type from
`strong > emphasis > text` (`***a***` re-parses as `emphasis > strong`)
and collapses any strict three-or-more nested emphasis chain
(`***a***` re-parses as `emphasis > strong` rather than three
nested emphases).
GH-12 catalogued a broader family of related shapes:
`***emphasis*in emphasis*`, `*a*_b__`, `a ***b*c d*`, and variants
with different leading run lengths. As wooorm noted in the thread,
"pulling a thread somewhere will have something happen somewhere
entirely different": marker choice in one place interacts with
flanking rules elsewhere, and covering every shape needs
coordination beyond picking a single marker per node.
Scope: land the minimal change that fixes the shapes guaranteed to
drift under the current serializer, without regressing shapes where
CommonMark's attention algorithm already recovers the original tree
through fusion. Escape-based work on the remaining GH-12 shapes is
left for follow-up.
Approach: introduce `lib/util/emphasis-marker.js`. Both the
emphasis handler and its peek route marker selection through it,
keeping lookahead in `container-phrasing` consistent with what the
handler emits. The helper flips in two narrow situations:
1. The emphasis is the only child of an attention parent (emphasis
or strong), and both its opening and closing markers would sit
immediately next to the parent's primary marker. Using the
opposite marker (`*_a_*`, `**_a_**`) breaks the fusion into
strong or em+strong.
2. The emphasis sits at the top of a strict same-type chain of
depth two or more (every link has exactly one emphasis child)
when the primary marker is `*`. Three-deep emphasis only
round-trips with `_` on the outside, because `_`'s flanking
rules are stricter than `*`'s. The rule is asymmetric on
purpose: with primary `_` the first rule's adjacency flip
alternates correctly on its own.
Strong is never flipped. A run of four asterisks already pairs as
two strong delimiters, six as three, and so on, so strong
round-trips without help.
Journey (what was tried and why the scope narrowed):
- An earlier iteration flipped strong too and regressed ~18 corpus
fixtures whose nested-strong shapes relied on long fused runs of
asterisks. Strong was dropped from the flip to recover them.
- Flipping whenever `info.before` or `info.after` matches the
primary caused cascading flips on paragraph-level attention
siblings: `[emphasis, strong, emphasis]` serialized as
`_a___a__*a*`, where `_` + `__` at the em/strong boundary
re-tokenised as a single `___` run. The flip was narrowed to
attention parents only.
- Widening rule 1 to first-or-last-child of any attention parent
fixed several GH-12 shapes (`***emphasis*in emphasis*`,
`***x*y z*`, `****x*y z*`) but regressed `***a*a*-*`
(`emphasis > [emphasis > [emphasis, text], text]`): CommonMark's
rule 17 uses the leading `***` fusion to recover the three-deep
structure, and the flip broke that recovery. The rule was
tightened to only-child plus strict-chain.
- The only-child formulation plus the strict-chain rule is the
widest version verified to cause zero transitions from ok to
finding across 600 corpus files (commonmark, gfm, all
configurations).
Edge cases covered by new tests:
- Plain emphasis and strong, with primaries `*` and `_`, showing
the helper is inert on non-nested attention.
- `emphasis > emphasis` with each of primary `*` and `_`, yielding
`*_a_*` and `_*a*_`.
- `strong > emphasis` yielding `**_a_**`.
- `emphasis > strong` and `strong > strong` preserved at `***a***`
and `****a****`, proving strong is untouched.
- Strict three-deep emphasis chains with both primaries, both
yielding `_*_a_*_` (chain flip vs adjacency flip arrive at the
same output by different routes).
- Emphasis parents with more than one child, demonstrating the
only-child guard preserves shapes vanilla handles.
- Middle-sibling emphasis, confirming no flip at non-boundary
positions.
- Top-level `[emphasis, strong, emphasis]` round-trip preserved.
- `***a*a*-*` fusion shape explicitly preserved as a regression
guard against future widening.
- Round-trips for parsed `*_a_*`, parsed `_*_a_*_`, synthesised
`emphasis > emphasis`, synthesised `strong > emphasis`, and a
three-deep chain preceded by a text sibling.
Refs: #121 parent ee3b345 commit 9f155f4
3 files changed
Lines changed: 385 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| 19 | + | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
63 | | - | |
| 62 | + | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
68 | | - | |
| 67 | + | |
| 68 | + | |
69 | 69 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
0 commit comments