Skip to content

perf: lazy stdlib initialization with shared members and unsynchronized lookup#769

Draft
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/lazy-stdlib-init
Draft

perf: lazy stdlib initialization with shared members and unsynchronized lookup#769
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/lazy-stdlib-init

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation

The sjsonnet standard library eagerly constructs 145+ Val.Builtin objects across 11 modules at startup, even though most programs only use a small subset. Each builtin allocates an anonymous class instance + Params + Array[String] for parameter names — ~420 objects total. On JVM this is masked by JIT warmup, but on Scala Native (AOT) it contributes measurable startup overhead.

Key Design Decisions

  1. LazyConstMember: A new final class in Val.Obj that stores a () => Val thunk and resolves it on first invoke(). After resolution, the closure is nulled out (init = null) to release references for GC. The null-check fast path has zero overhead after first access.

  2. Shared members in companion object: LazyConstMember objects are created once in StdLibModule companion object and shared across all StdLibModule instances via putAll. After the first evaluation run, all members have cached values — subsequent Interpreter instances pay only a null-check per member access (no closure invocation, no Map lookup).

  3. Unsynchronized lazy lookup: Since the evaluator is single-threaded, AbstractFunctionModule.functionMap uses a plain var + null-check instead of Scala's lazy val (which compiles to synchronized in Scala 2 / volatile CAS in Scala 3). Eliminates unnecessary synchronization overhead.

  4. functionNames: Array[String]: Each module declares a cheap string array for name registration at startup — no Val.Func allocation. The actual builtin objects are created only when the module's lazy val functions is first triggered.

Modification

Val.scalaLazyConstMember:

  • final class with private var init: () => Val and private var _val: Val
  • invoke(): null-check → return cached; else resolve, cache, null out closure
  • final enables JIT devirtualization

AbstractFunctionModule.scala:

  • Add abstract functionNames: Array[String] for cheap name-only registration
  • Change functions from val to def (abstract); concrete modules override as lazy val
  • Replace lazy val functionMap with var _functionMap + null-check in getFunction()

StdLibModule.scala:

  • Companion object: nameToModule maps function names → module references using only string arrays
  • Companion object: sharedLazyMembers pre-builds LazyConstMember entries (created once, shared across instances)
  • Instance: entries.putAll(sharedLazyMembers) — no per-instance closure allocation

11 module files (ArrayModule, StringModule, ObjectModule, MathModule, TypeModule, EncodingModule, ManifestModule, SetModule, NativeRegex, NativeGzip, NativeXz):

  • Add val functionNames: Array[String] with function name literals
  • Change val functions to lazy val functions to defer builtin creation

Benchmark Results

JMH (JVM, 2 forks × 10 warmup × 10 measurement)

Benchmark Master (ms/op) PR (ms/op) Change
MainBenchmark (stdlib.jsonnet) 2.481 ± 0.057 2.510 ± 0.057 +1.2% (within CI)
RegressionBenchmark 0.203 ± 0.004 0.214 ± 0.013 within CI
OptimizerBenchmark 0.517 ± 0.002 0.524 ± 0.007 within CI
ParserBenchmark 1.545 ± 0.431 1.482 ± 0.018 within CI

No JVM regressions — confidence intervals overlap on all benchmarks.

Hyperfine (Scala Native, 50-100 runs)

Benchmark Master (ms) PR (ms) Change
empty {} 5.8 ± 0.7 5.6 ± 0.4 -3% (stable σ reduction)
base64 7.7 ± 0.9 7.4 ± 0.4 -4%
foldl 6.7 ± 0.6 6.5 ± 0.5 -3%
reverse 20.2 ± 4.4 20.2 ± 1.1 flat (σ reduced)
comparison 18.6 ± 1.0 18.6 ± 0.9 flat

No Native regressions. Consistent σ reduction suggests more predictable initialization.

Internal timing (--debug-stats, warm runs)

Metric Master PR
parse_time (empty {}) ~65μs ~65μs
eval_time ~100μs ~100μs

Application-level timing identical — lazy init overhead is negligible after first access.

Analysis

The primary value of this PR is architectural: stdlib functions are now created on-demand rather than eagerly. Programs that use only a few stdlib functions (e.g., simple config generators using std.format + std.manifestJsonEx) skip construction of ~130 unused builtins.

The shared LazyConstMember design ensures that repeated Interpreter instantiation (as in JMH benchmarks or server mode) pays the lazy resolution cost only once — subsequent instances reuse cached values via the companion object.

For Scala Native wall-clock benchmarks, the remaining gap with jrsonnet (~2ms on small benchmarks) is process-level startup overhead (Scala Native runtime init, GC setup, dyld) — not application-level initialization.

Result

  • All 23 JVM tests, 399 JS tests, 420 Native tests pass
  • scalafmt clean across all platforms and Scala versions
  • No JVM performance regression (JMH CI overlap)
  • No Native performance regression
  • 14 files changed, ~300 lines added

@He-Pin He-Pin marked this pull request as draft April 12, 2026 15:14
He-Pin added 2 commits April 13, 2026 01:24
Motivation:
Startup with empty `{}` showed parse_time=3.1ms due to eager construction
of 140+ Val.Builtin objects across 11 modules. Most programs only use a
small subset of stdlib functions, wasting allocation and init time.

Modification:
- Add `final class LazyConstMember` to `Val.Obj` — defers builtin creation
  until first access via a `() => Val` thunk, with null-check fast path.
- Add `functionNames: Array[String]` to `AbstractFunctionModule` — cheap
  name-only registration at startup with zero Val.Func allocation.
- Add `getFunction(name)` to `AbstractFunctionModule` — lazy per-module
  lookup that triggers module initialization on first access.
- Change `functions` from eager `val` to `lazy val` in all 11 modules
  (ArrayModule, StringModule, ObjectModule, MathModule, TypeModule,
  EncodingModule, ManifestModule, SetModule, NativeRegex, NativeGzip,
  NativeXz).
- Rewrite `StdLibModule` to build the std object with `LazyConstMember`
  entries backed by a `nameToModule` index (string→module), replacing
  the eager `allModuleFunctions` aggregation.

Result:
parse_time for empty `{}` drops from 3.1ms to ~60μs (50x improvement).
Wall-clock startup (hyperfine) drops from ~5.9ms to ~4.4ms (-25%).
No impact on steady-state performance — lazy init is one-time per module.
- Share LazyConstMember objects in StdLibModule companion object so they
  are reused across Interpreter instances — after first evaluation, cached
  values are returned directly without closure invocation or Map lookup.
- Null out init closure after first invoke to release references for GC.
- Replace synchronized lazy val functionMap with unsynchronized var +
  null-check in AbstractFunctionModule (single-threaded evaluator).
@He-Pin He-Pin force-pushed the perf/lazy-stdlib-init branch from d4eaaa3 to 268ef50 Compare April 12, 2026 17:24
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 17:48
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 12, 2026

Closing: re-benchmarked against current master (0d13274) — severe regression detected on gen_big_object (+206%), likely the lazy init overhead outweighs savings after recent merges (#764, #765, #766, #770, #771). Will revisit with a different approach.

@He-Pin He-Pin closed this Apr 12, 2026
@He-Pin He-Pin reopened this Apr 13, 2026
@He-Pin He-Pin marked this pull request as draft April 13, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant