Skip to content

Commit 4d16e17

Browse files
authored
perf: add canDirectIterate fast path to fused ByteRenderer materializer (#773)
## Motivation The fused ByteRenderer (`materializeDirect`) path used by Scala Native bypasses the upickle Visitor interface for direct byte-level JSON rendering. However, it was missing the `canDirectIterate` optimization already present in `Materializer.materializeRecursiveObj` — meaning every object field went through `visibleKeyNames` allocation + `value()` HashMap lookup, even for simple inline objects. Profiling showed **realistic2 spends 99% of its time in materialization** (155ms materialize vs 1.7ms eval), processing ~125K objects and producing 28MB JSON output. This makes materializer optimization the highest-leverage target. ## Key Design Decision Mirror the `canDirectIterate` fast path from `Materializer.scala` into `ByteRenderer.scala`, splitting `materializeDirectObj` into three specialized methods that avoid HashMap lookup for the common case of inline objects. ## Modification Split `ByteRenderer.materializeDirectObj` into: - **`materializeDirectInlineObj`**: Iterates raw `inlineFieldKeys`/`inlineFieldMembers` arrays directly, invoking members by index. Handles both multi-field and single-field objects. - **`materializeDirectSortedInlineObj`**: Uses `_sortedInlineOrder` cached sort order (shared across all objects from same MemberList) for sorted output. - **`materializeDirectGenericObj`**: Fallback to `visibleKeyNames` + `value()` for complex objects with super chains or excludedKeys. ## Benchmark Results ### JMH (JVM, Scala 3.3.7) Baseline: master @ `0d13274` | Benchmark | Before (ms) | After (ms) | Change | |-----------|-------------|-----------|--------| | **realistic2** | **57.541** | **49.391** | **-14.2%** ✅ | | comparison2 | 18.681 | 17.606 | -5.8% ✅ | | base64Decode | 0.123 | 0.118 | -4.1% | | bench.02 | 35.401 | 32.904 | -7.1% | | reverse | 6.717 | 6.883 | +2.5% (noise) | ### Scala Native (hyperfine, 15 runs, 5 warmup) | Binary | realistic2 (ms) | Relative | |--------|----------------|----------| | **sjsonnet (this PR)** | **96.1 ± 2.4** | **1.00x** ✅ | | jrsonnet (Rust) | 112.8 ± 5.5 | 1.17x slower | | sjsonnet (master) | 171.8 ± 5.6 | 1.79x slower | **sjsonnet now beats jrsonnet by 17% on realistic2!** ## Analysis The `canDirectIterate` fast path eliminates: 1. **`visibleKeyNames` allocation**: No more `ArrayBuffer` → `Array` creation per object 2. **`value()` HashMap lookup**: No more key-based cache lookup per field (replaced by direct index invocation) 3. **Validation checks**: Inline fields skip the `value()` validation path For realistic2 with 125K objects, this removes ~125K HashMap lookups and ~125K array allocations from the hot materialization loop. ## References - Mirrors `Materializer.materializeInlineObj` / `materializeSortedInlineObj` logic - Related profiling: `sjsonnet --debug-stats bench/resources/cpp_suite/realistic2.jsonnet` ## Result All 420 tests pass across JVM/JS/WASM/Native × Scala 2.12/2.13/3.3.7.
1 parent 0020cec commit 4d16e17

1 file changed

Lines changed: 139 additions & 34 deletions

File tree

sjsonnet/src/sjsonnet/ByteRenderer.scala

Lines changed: 139 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package sjsonnet
22

33
import java.io.OutputStream
44

5+
import scala.inline
56
import upickle.core.{ArrVisitor, ObjVisitor}
67

78
/**
@@ -255,54 +256,158 @@ class ByteRenderer(out: OutputStream = new java.io.ByteArrayOutputStream(), inde
255256
Error.fail("Stackoverflow while materializing, possibly due to recursive value", obj.pos)
256257
try {
257258
obj.triggerAllAsserts(ctx.brokenAssertionLogic)
258-
val keys =
259-
if (ctx.sort) obj.visibleKeyNames.sorted(Util.CodepointStringOrdering)
260-
else obj.visibleKeyNames
261-
262-
// Inline of visitObject — open brace
263-
elemBuilder.append('{')
264-
newlineBuffered = true
265-
depth += 1
266-
resetEmpty()
259+
if (obj.canDirectIterate) {
260+
// Fast path: inline objects (no super chain, no excludedKeys).
261+
// Bypasses visibleKeyNames allocation and value() HashMap lookup per key,
262+
// invoking members directly by array index.
263+
if (ctx.sort) materializeDirectSortedInlineObj(obj, matDepth, ctx)
264+
else materializeDirectInlineObj(obj, matDepth, ctx)
265+
} else {
266+
materializeDirectGenericObj(obj, matDepth, ctx)
267+
}
268+
} finally {
269+
ctx.exitObject(obj)
270+
}
271+
}
267272

268-
var i = 0
269-
while (i < keys.length) {
270-
val key = keys(i)
271-
val childVal = obj.value(key, ctx.emptyPos)
273+
/** Open an object brace and initialize depth/empty state. */
274+
@inline private def openObjBrace(): Unit = {
275+
elemBuilder.append('{')
276+
newlineBuffered = true
277+
depth += 1
278+
resetEmpty()
279+
}
272280

273-
markNonEmpty()
281+
/** Close an object brace, handling empty vs non-empty formatting. */
282+
@inline private def closeObjBrace(): Unit = {
283+
commaBuffered = false
284+
newlineBuffered = false
285+
val wasEmpty = isEmpty
286+
resetEmpty()
287+
depth -= 1
288+
if (wasEmpty) elemBuilder.append(' ')
289+
else renderIndent()
290+
elemBuilder.append('}')
291+
flushByteBuilder()
292+
}
274293

275-
// Flush comma+indent from previous pair, then render key+value
276-
// without intermediate flushes
277-
flushBuffer()
278-
renderQuotedString(key)
294+
/** Render a single key-value pair (comma buffering assumed by caller). */
295+
@inline private def renderKeyValue(
296+
key: String,
297+
childVal: Val,
298+
matDepth: Int,
299+
ctx: Materializer.MaterializeContext)(implicit evaluator: EvalScope): Unit = {
300+
markNonEmpty()
301+
flushBuffer()
302+
renderQuotedString(key)
303+
elemBuilder.append(':')
304+
elemBuilder.append(' ')
305+
materializeChild(childVal, matDepth, ctx)
306+
}
307+
308+
/** Fused inline object rendering — bypasses visibleKeyNames and value() lookup. */
309+
private def materializeDirectInlineObj(
310+
obj: Val.Obj,
311+
matDepth: Int,
312+
ctx: Materializer.MaterializeContext)(implicit evaluator: EvalScope): Unit = {
313+
val fs = ctx.emptyPos.fileScope
314+
val rawKeys = obj.inlineKeys
315+
if (rawKeys != null) {
316+
val rawMembers = obj.inlineMembers
317+
val rawN = rawKeys.length
318+
319+
openObjBrace()
320+
321+
var i = 0
322+
while (i < rawN) {
323+
val m = rawMembers(i)
324+
if (m.visibility != Expr.Member.Visibility.Hidden) {
325+
val childVal = m.invoke(obj, null, fs, evaluator)
326+
if (!obj._skipFieldCache) obj.cacheFieldValue(rawKeys(i), childVal)
327+
renderKeyValue(rawKeys(i), childVal, matDepth, ctx)
328+
commaBuffered = true
329+
}
330+
i += 1
331+
}
279332

280-
// Key-value separator ": "
281-
elemBuilder.append(':')
333+
closeObjBrace()
334+
} else {
335+
// Single-field object
336+
val sfm = obj.singleMem
337+
if (sfm.visibility != Expr.Member.Visibility.Hidden) {
338+
openObjBrace()
339+
val childVal = sfm.invoke(obj, null, fs, evaluator)
340+
if (!obj._skipFieldCache) obj.cacheFieldValue(obj.singleKey, childVal)
341+
renderKeyValue(obj.singleKey, childVal, matDepth, ctx)
342+
closeObjBrace()
343+
} else {
344+
// Empty object (single hidden field)
345+
elemBuilder.append('{')
282346
elemBuilder.append(' ')
347+
elemBuilder.append('}')
348+
flushByteBuilder()
349+
}
350+
}
351+
}
352+
353+
/** Fused sorted inline object rendering — uses cached sorted field order. */
354+
private def materializeDirectSortedInlineObj(
355+
obj: Val.Obj,
356+
matDepth: Int,
357+
ctx: Materializer.MaterializeContext)(implicit evaluator: EvalScope): Unit = {
358+
val fs = ctx.emptyPos.fileScope
359+
val rawKeys = obj.inlineKeys
360+
if (rawKeys != null) {
361+
val rawMembers = obj.inlineMembers
362+
val order = {
363+
val cached = obj._sortedInlineOrder
364+
if (cached != null) cached
365+
else Materializer.computeSortedInlineOrder(rawKeys, rawMembers)
366+
}
367+
val visCount = order.length
283368

284-
// Render value directly — no flush overhead
285-
materializeChild(childVal, matDepth, ctx)
369+
openObjBrace()
286370

371+
var i = 0
372+
while (i < visCount) {
373+
val idx = order(i)
374+
val childVal = rawMembers(idx).invoke(obj, null, fs, evaluator)
375+
if (!obj._skipFieldCache) obj.cacheFieldValue(rawKeys(idx), childVal)
376+
renderKeyValue(rawKeys(idx), childVal, matDepth, ctx)
287377
commaBuffered = true
288378
i += 1
289379
}
290380

291-
// Inline of visitEnd — close brace
292-
commaBuffered = false
293-
newlineBuffered = false
294-
val wasEmpty = isEmpty
295-
resetEmpty()
296-
depth -= 1
297-
if (wasEmpty) elemBuilder.append(' ')
298-
else renderIndent()
299-
elemBuilder.append('}')
300-
flushByteBuilder()
301-
} finally {
302-
ctx.exitObject(obj)
381+
closeObjBrace()
382+
} else {
383+
// Single-field: sorted = unsorted
384+
materializeDirectInlineObj(obj, matDepth, ctx)
303385
}
304386
}
305387

388+
/** Generic object rendering — uses visibleKeyNames + value() lookup. */
389+
private def materializeDirectGenericObj(
390+
obj: Val.Obj,
391+
matDepth: Int,
392+
ctx: Materializer.MaterializeContext)(implicit evaluator: EvalScope): Unit = {
393+
val keys =
394+
if (ctx.sort) obj.visibleKeyNames.sorted(Util.CodepointStringOrdering)
395+
else obj.visibleKeyNames
396+
397+
openObjBrace()
398+
399+
var i = 0
400+
while (i < keys.length) {
401+
val key = keys(i)
402+
val childVal = obj.value(key, ctx.emptyPos)
403+
renderKeyValue(key, childVal, matDepth, ctx)
404+
commaBuffered = true
405+
i += 1
406+
}
407+
408+
closeObjBrace()
409+
}
410+
306411
private def materializeDirectArr(
307412
xs: Val.Arr,
308413
matDepth: Int,

0 commit comments

Comments
 (0)