perf: lazy stdlib initialization with shared members and unsynchronized lookup#769
Draft
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Draft
perf: lazy stdlib initialization with shared members and unsynchronized lookup#769He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Conversation
Motivation:
Startup with empty `{}` showed parse_time=3.1ms due to eager construction
of 140+ Val.Builtin objects across 11 modules. Most programs only use a
small subset of stdlib functions, wasting allocation and init time.
Modification:
- Add `final class LazyConstMember` to `Val.Obj` — defers builtin creation
until first access via a `() => Val` thunk, with null-check fast path.
- Add `functionNames: Array[String]` to `AbstractFunctionModule` — cheap
name-only registration at startup with zero Val.Func allocation.
- Add `getFunction(name)` to `AbstractFunctionModule` — lazy per-module
lookup that triggers module initialization on first access.
- Change `functions` from eager `val` to `lazy val` in all 11 modules
(ArrayModule, StringModule, ObjectModule, MathModule, TypeModule,
EncodingModule, ManifestModule, SetModule, NativeRegex, NativeGzip,
NativeXz).
- Rewrite `StdLibModule` to build the std object with `LazyConstMember`
entries backed by a `nameToModule` index (string→module), replacing
the eager `allModuleFunctions` aggregation.
Result:
parse_time for empty `{}` drops from 3.1ms to ~60μs (50x improvement).
Wall-clock startup (hyperfine) drops from ~5.9ms to ~4.4ms (-25%).
No impact on steady-state performance — lazy init is one-time per module.
- Share LazyConstMember objects in StdLibModule companion object so they are reused across Interpreter instances — after first evaluation, cached values are returned directly without closure invocation or Map lookup. - Null out init closure after first invoke to release references for GC. - Replace synchronized lazy val functionMap with unsynchronized var + null-check in AbstractFunctionModule (single-threaded evaluator).
d4eaaa3 to
268ef50
Compare
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The sjsonnet standard library eagerly constructs 145+ Val.Builtin objects across 11 modules at startup, even though most programs only use a small subset. Each builtin allocates an anonymous class instance + Params + Array[String] for parameter names — ~420 objects total. On JVM this is masked by JIT warmup, but on Scala Native (AOT) it contributes measurable startup overhead.
Key Design Decisions
LazyConstMember: A newfinal classinVal.Objthat stores a() => Valthunk and resolves it on firstinvoke(). After resolution, the closure is nulled out (init = null) to release references for GC. The null-check fast path has zero overhead after first access.Shared members in companion object:
LazyConstMemberobjects are created once inStdLibModulecompanion object and shared across allStdLibModuleinstances viaputAll. After the first evaluation run, all members have cached values — subsequentInterpreterinstances pay only a null-check per member access (no closure invocation, no Map lookup).Unsynchronized lazy lookup: Since the evaluator is single-threaded,
AbstractFunctionModule.functionMapuses a plainvar+ null-check instead of Scala'slazy val(which compiles tosynchronizedin Scala 2 / volatile CAS in Scala 3). Eliminates unnecessary synchronization overhead.functionNames: Array[String]: Each module declares a cheap string array for name registration at startup — noVal.Funcallocation. The actual builtin objects are created only when the module'slazy val functionsis first triggered.Modification
Val.scala —
LazyConstMember:final classwithprivate var init: () => Valandprivate var _val: Valinvoke(): null-check → return cached; else resolve, cache, null out closurefinalenables JIT devirtualizationAbstractFunctionModule.scala:
functionNames: Array[String]for cheap name-only registrationfunctionsfromvaltodef(abstract); concrete modules override aslazy vallazy val functionMapwithvar _functionMap+ null-check ingetFunction()StdLibModule.scala:
nameToModulemaps function names → module references using only string arrayssharedLazyMemberspre-buildsLazyConstMemberentries (created once, shared across instances)entries.putAll(sharedLazyMembers)— no per-instance closure allocation11 module files (ArrayModule, StringModule, ObjectModule, MathModule, TypeModule, EncodingModule, ManifestModule, SetModule, NativeRegex, NativeGzip, NativeXz):
val functionNames: Array[String]with function name literalsval functionstolazy val functionsto defer builtin creationBenchmark Results
JMH (JVM, 2 forks × 10 warmup × 10 measurement)
No JVM regressions — confidence intervals overlap on all benchmarks.
Hyperfine (Scala Native, 50-100 runs)
{}No Native regressions. Consistent σ reduction suggests more predictable initialization.
Internal timing (
--debug-stats, warm runs){})Application-level timing identical — lazy init overhead is negligible after first access.
Analysis
The primary value of this PR is architectural: stdlib functions are now created on-demand rather than eagerly. Programs that use only a few stdlib functions (e.g., simple config generators using
std.format+std.manifestJsonEx) skip construction of ~130 unused builtins.The shared
LazyConstMemberdesign ensures that repeatedInterpreterinstantiation (as in JMH benchmarks or server mode) pays the lazy resolution cost only once — subsequent instances reuse cached values via the companion object.For Scala Native wall-clock benchmarks, the remaining gap with jrsonnet (~2ms on small benchmarks) is process-level startup overhead (Scala Native runtime init, GC setup, dyld) — not application-level initialization.
Result