Commit 82301d7
authored
perf: Full byte[] rendering pipeline with SWAR escape scanning and fused materializer (#745)
## Motivation
The rendering pipeline is the dominant cost in sjsonnet's output path.
On Scala Native, `realistic2` materialization alone takes ~190ms out of
~270ms total (70%). The existing pipeline routes through `char[]`
buffers → `OutputStreamWriter` → UTF-8 encoding → `byte[]` →
`OutputStream`, adding unnecessary conversion layers for what is
predominantly ASCII JSON output.
This PR introduces a full `byte[]` rendering pipeline that eliminates
the char-to-byte conversion entirely, adds SWAR (SIMD Within A Register)
escape-character scanning, zero-allocation integer rendering, and a
fused materializer that bypasses the upickle Visitor dispatch interface.
## Key Design Decisions
1. **byte[] pipeline over char[]**: `BaseByteRenderer` mirrors
`BaseCharRenderer` but uses `upickle.core.ByteBuilder` (byte[]) instead
of `CharBuilder` (char[]), writing directly to `OutputStream`. This
eliminates the `OutputStreamWriter` UTF-8 encoding layer and halves
buffer memory for ASCII content.
2. **SWAR escape-char scanning**: `CharSWAR` processes 8 bytes per
iteration using bitwise parallel techniques (Hacker's Delight Ch. 6
zero-detection) to detect `"`, `\`, and control chars. Platform-specific
implementations: JVM uses `VarHandle` for misaligned reads, Scala Native
uses `Intrinsics.loadLong` + `ByteArray.atRawUnsafe`, JS falls back to
scalar loops.
3. **Two-tier string rendering**: Short strings (< 128 chars) use a
fused encode+check loop with zero allocation. Long strings (≥ 128 chars)
use `getBytes(UTF-8)` + SWAR bulk scan + `arraycopy`. The SWAR pre-scan
determines if the fast path (direct copy) can be taken, avoiding
per-character escape processing for clean strings.
4. **Digit-pair lookup table**: Integer rendering uses
two-digits-at-a-time conversion via `DIGIT_TENS`/`DIGIT_ONES` lookup
tables, writing backward into a scratch buffer then bulk-copying.
Eliminates `Long.toString` allocation for the most common numeric
output.
5. **Fused materializer+renderer**: `ByteRenderer.materializeDirect()`
walks the `Val` tree and writes JSON bytes directly, bypassing the
upickle `Visitor` interface entirely (no
`visitObject`/`visitArray`/`visitKey`/`visitValue`/`subVisitor` virtual
dispatch). Uses `@switch` on `valTag` for O(1) type routing. Falls back
to the generic `Materializer.apply0` path for deeply nested structures.
6. **Reusable visitor instances**: Pre-allocated
`ArrVisitor`/`ObjVisitor` fields with a `Long` bitset for empty-state
tracking (bit per nesting level, supports 64 levels). Eliminates
per-array/per-object anonymous class allocation in the non-fused visitor
path.
7. **Bulk indentation**: `renderIndent` uses `System.arraycopy` from a
pre-allocated 64-byte spaces buffer instead of character-by-character
append.
8. **Native fwrite direct stdout**: `NativeOutputStream` bypasses the
Scala Native JVM compat layer (`PrintStream.write (synchronized)` →
`FileOutputStream` → `FileChannelImpl` → `unistd.write`) with direct
`stdio.fwrite(buf.at(off), 1, len, file)`. Eliminates per-write
synchronization and syscall indirection.
## Modifications
### New files
**`BaseByteRenderer.scala`** (shared `src/`): Byte-oriented JSON
renderer extending `ujson.JsVisitor[OutputStream, OutputStream]`.
Handles all JSON primitives, string rendering (short/long paths),
integer rendering (digit-pair tables), and indentation. Provides
`renderQuotedString` for the fused path.
**`ByteRenderer.scala`** (shared `src/`): sjsonnet-specific byte
renderer with custom double formatting (matching google/jsonnet output),
empty `{ }`/`[ ]` rendering, reusable visitor instances, and the fused
materializer (`materializeDirect`, `materializeChild`,
`materializeDirectObj`, `materializeDirectArr`).
**`CharSWAR.java`** (JVM `src-jvm/`): SWAR scanner using
`VarHandle.get(byte[], offset)` for misaligned 8-byte reads. Handles
both `String` (via `getChars` to char[]) and `byte[]` inputs.
**`CharSWAR.scala`** (Native `src-native/`): SWAR scanner using
`Intrinsics.loadLong` + `ByteArray.atRawUnsafe` for zero-overhead bulk
reads.
**`CharSWAR.scala`** (JS `src-js/`): Scalar fallback for Scala.js (no
SWAR — JS lacks raw memory access).
**`NativeOutputStream.scala`** (Native `src-native/`): Direct
`fwrite`-based OutputStream for Scala Native, bypassing the JVM compat
chain.
### Modified files
**`SjsonnetMainBase.scala`**: File output and stdout paths now use
`ByteRenderer` directly (bypassing `OutputStreamWriter`). Stdout path
returns a sentinel value to avoid re-printing already-written output.
Added `rawOutputStream` parameter to support Native fwrite bypass.
**`SjsonnetMain.scala`** (Native): Passes
`NativeOutputStream(stdio.stdout)` as `rawOutputStream`.
**`Interpreter.scala`**: `materialize()` detects `ByteRenderer` and
routes to the fused `materializeDirect()` path, bypassing the generic
`Materializer.apply0` visitor dispatch.
**`BaseCharRenderer.scala`**: `visitNonNullString` now uses
`CharSWAR.hasEscapeChar` for pre-scanning. Added `writeLongDirect` with
digit-pair lookup tables. Added companion object with lookup tables.
**`Renderer.scala`**: `visitFloat64` inlined to avoid
`RenderUtils.renderDouble` String allocation — uses `writeLongDirect`
for integers, `BigDecimal` for whole-number doubles, `d.toString` for
fractionals.
**`Materializer.scala`**: Fixed `Apply`/`Apply0-3` pattern match arity
for auto-TCO `strict` field (upstream `ecdd0b6`).
## Benchmark Results
### Hyperfine (Scala Native, `realistic2`, averaged over 2 rounds)
| Config | Master (ms) | This PR (ms) | Speedup |
|--------|:-----------:|:------------:|:-------:|
| stdout | 270 ± 5 | 175 ± 6 | **1.55x (35% faster)** |
| stdout `-p` | 250 ± 4 | 162 ± 3 | **1.54x (35% faster)** |
| file `-o` | 449 ± 69 | 405 ± 69 | 1.11x (IO bound) |
Output correctness verified: `diff` confirms byte-identical output
between master and this PR.
## Analysis
The byte[] pipeline optimization stacks four independent wins:
1. **OutputStreamWriter elimination** (~10%): Removing the
char[]→UTF-8→byte[] conversion layer. Most impactful for file output
where the full `OutputStreamWriter` synchronization overhead applies.
2. **SWAR escape scanning** (~5%): 8x throughput for escape-char
detection on clean strings (the common case). The SWAR pre-scan gates a
fast bulk-copy path, avoiding per-character processing.
3. **Fused materializer** (~15-20%): Eliminating Visitor interface
virtual dispatch. On JVM with JIT, devirtualization handles most of this
automatically. On Scala Native without JIT, every
`visitObject`/`subVisitor`/`visitKey`/`visitValue`/`visitEnd` call is a
vtable lookup + indirect branch — the fused path replaces all of these
with direct method calls.
4. **Native fwrite bypass** (~5%): Eliminating `PrintStream`
synchronized lock + `FileChannelImpl` indirection on every write.
`stdio.fwrite` has internal buffering and batches small writes before
syscall.
## Notes
The `lazy_reverse_correctness.jsonnet` test failure on Scala 2.13.18 is
a **pre-existing upstream bug** from PR #741 (lazy reverse array).
Upstream master itself does not compile on 2.13 due to the auto-TCO
pattern match arity issue (ecdd0b6), so this test was never run on 2.13
upstream. This PR fixes the compilation issue but exposes the runtime
bug. This is not a regression introduced by this PR.
## Result
- All test suites pass on Scala 3.3.7, JS, WASM, Native
- Scala 2.13.18: 1 pre-existing upstream failure
(`lazy_reverse_correctness.jsonnet`)
- No regressions detected
- Output is byte-identical to master for all test cases1 parent 097f44e commit 82301d7
10 files changed
Lines changed: 1232 additions & 53 deletions
File tree
- sjsonnet
- src-js/sjsonnet
- src-jvm-native/sjsonnet
- src-jvm/sjsonnet
- src-native/sjsonnet
- src/sjsonnet
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
19 | 26 | | |
20 | 27 | | |
21 | 28 | | |
| |||
101 | 108 | | |
102 | 109 | | |
103 | 110 | | |
104 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
105 | 138 | | |
106 | 139 | | |
107 | 140 | | |
| |||
170 | 203 | | |
171 | 204 | | |
172 | 205 | | |
173 | | - | |
| 206 | + | |
| 207 | + | |
174 | 208 | | |
175 | 209 | | |
176 | 210 | | |
| |||
185 | 219 | | |
186 | 220 | | |
187 | 221 | | |
188 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
189 | 236 | | |
190 | 237 | | |
191 | 238 | | |
| |||
263 | 310 | | |
264 | 311 | | |
265 | 312 | | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
272 | 344 | | |
273 | 345 | | |
274 | 346 | | |
| |||
320 | 392 | | |
321 | 393 | | |
322 | 394 | | |
323 | | - | |
| 395 | + | |
| 396 | + | |
324 | 397 | | |
325 | 398 | | |
326 | 399 | | |
| |||
455 | 528 | | |
456 | 529 | | |
457 | 530 | | |
458 | | - | |
| 531 | + | |
| 532 | + | |
459 | 533 | | |
460 | | - | |
| 534 | + | |
| 535 | + | |
461 | 536 | | |
462 | 537 | | |
463 | 538 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
0 commit comments