Skip to content

perf: eliminate closure allocations in evaluator hot paths#775

Draft
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/closure-elimination
Draft

perf: eliminate closure allocations in evaluator hot paths#775
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/closure-elimination

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Summary

Replace .map()/.filter()/.foreach() calls with explicit while loops in the evaluator to eliminate per-call closure allocations on hot paths.

Note: Array.map already creates the target array directly (no "intermediate" array). The saved allocation is the closure/lambda passed to these methods, plus the method call overhead of the Scala collections layer.

Changes:

  • visitArr: replace .map(visitAsLazy) with while loop + empty array shortcut
  • visitApply: extract evalArgsToArray helper to replace 2× .map calls (tailstrict and non-tailstrict paths)
  • visitExprWithTailCallSupport: reuse evalArgsToArray for tail-position Apply
  • visitImportBin: replace .map with while loop for raw bytes → Array[Eval] conversion
  • visitComp IfSpec: replace .filter with manual ArrayBuilder while loop
  • visitMemberList: replace fields.foreach with while loop (eliminates per-object construction closure)

Benchmark Results

Hyperfine (20 runs, JVM, M4 Max macOS):

Benchmark Before After Δ
realistic2 418.9ms 397.5ms +5.1%
bench.02 336.5ms 322.8ms +4.1%
realistic1 292.5ms 287.8ms +1.6%
bench.08 250.0ms 249.4ms flat

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.test — all 4 test suites pass
  • ./mill __.checkFormat — formatting clean

Notes

Low-risk, localized changes (1 file, 70 insertions, 37 deletions). Each method replaces a single .map/.filter/.foreach call with an equivalent while loop — the same pattern already used in visitLocalExpr, visitApplyBuiltin, and several stdlib functions (std.map, std.filter, SetModule, etc.).

The evalArgsToArray helper is reused by both visitApply and visitExprWithTailCallSupport, keeping the code DRY.

Why JIT doesn't fully optimize these away: Array.map goes through scala.collection.ArrayOps (implicit conversion), adding an indirection layer. Lambda bodies with pattern matching (e.g., fields.foreach { case ... }) produce larger bytecode that C2 may decide not to inline.

He-Pin added 2 commits April 13, 2026 04:11
Replace .map() and .filter() calls with explicit while loops in the
evaluator to eliminate intermediate Array allocations that increase GC
pressure in hot paths.

Changes:
- visitArr: replace .map(visitAsLazy) with while loop
- visitApply: extract evalArgsToArray helper to replace .map calls
- visitExprWithTailCallSupport: reuse evalArgsToArray for tail Apply
- visitImportBin: replace .map with while loop
- visitComp IfSpec: replace .filter with manual filtered array builder

Benchmark results (hyperfine, 20 runs, M4 Max macOS):
  realistic2:  418.9ms -> 397.5ms (+5.1%)
  bench.02:    336.5ms -> 322.8ms (+4.1%)
  realistic1:  292.5ms -> 287.8ms (+1.6%)
  bench.08:    250.0ms -> 249.4ms (flat)

🤖 Generated with [Qoder][https://qoder.com]
Replace .map()/.filter()/.foreach() calls with explicit while loops in
the evaluator to eliminate per-call closure allocations on hot paths.

Note: Array.map already creates the target array directly (no "intermediate"
array). The saved allocation is the closure/lambda passed to these methods,
plus the method call overhead of the Scala collections layer.

Changes:
- visitArr: replace .map(visitAsLazy) with while loop + empty array shortcut
- visitApply: extract evalArgsToArray helper to replace 2x .map calls
- visitExprWithTailCallSupport: reuse evalArgsToArray for tail Apply
- visitImportBin: replace .map with while loop for raw bytes conversion
- visitComp IfSpec: replace .filter with manual ArrayBuilder while loop
- visitMemberList: replace fields.foreach with while loop (per-object closure)

Benchmark results (hyperfine, 20 runs, JVM, M4 Max macOS):
  realistic2:  418.9ms -> 397.5ms (+5.1%)
  bench.02:    336.5ms -> 322.8ms (+4.1%)
  realistic1:  292.5ms -> 287.8ms (+1.6%)
  bench.08:    250.0ms -> 249.4ms (flat)

🤖 Generated with [Qoder][https://qoder.com]
@He-Pin He-Pin marked this pull request as draft April 12, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant