perf: SIMD base64 via aklomp/base64 + ByteArr/RangeArr/asciiSafe#778
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: SIMD base64 via aklomp/base64 + ByteArr/RangeArr/asciiSafe#778He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
Contributor
Author
|
@JoshRosen @stephenamar-db This new implementation uses a mature third-party library; please take another look. |
He-Pin
commented
Apr 15, 2026
stephenamar-db
requested changes
Apr 17, 2026
He-Pin
commented
Apr 18, 2026
Contributor
Author
|
I have updated the PR @stephenamar-db |
He-Pin
added a commit
to He-Pin/sjsonnet
that referenced
this pull request
Apr 18, 2026
Motivation: Combined review of PR databricks#776 + databricks#778 identified ~130 lines of duplicated SWAR string rendering and long-to-char conversion code, plus two missing overflow checks in StringModule. Modification: - Extract renderQuotedStringSWAR as protected method in BaseCharRenderer, delegate from MaterializeJsonRenderer (removes ~60 lines duplication) - Make escapeCharInline protected, remove duplicate in Renderer - Consolidate Renderer.visitFloat64 onto inherited writeLongDirect, remove standalone RenderUtils.appendLong (~40 lines) - Add totalLen > Int.MaxValue guard in Join pre-sized allocation - Add Long overflow detection in parseDigits - Leverage _asciiSafe flag in Substr/Join to skip redundant scans Result: Net -132 lines. All tests pass across JVM/JS/Native/WASM.
stephenamar-db
requested changes
Apr 21, 2026
stephenamar-db
approved these changes
Apr 21, 2026
Motivation: PR databricks#749 added SIMD base64 and runtime optimizations (ByteArr, RangeArr, asciiSafe) but was reverted by databricks#777 due to incorrect hand-written x86 SIMD C code. This PR restores all optimizations while replacing the buggy SIMD code with the battle-tested aklomp/base64 library. Modification: - Replace hand-written C SIMD with aklomp/base64 (BSD-2-Clause) which provides correct SIMD dispatch (SSSE3/AVX2/AVX512/NEON64) via runtime CPU detection - Add PlatformBase64 abstraction: JVM/JS use java.util.Base64 with strict RFC 4648 padding validation, Native uses aklomp/base64 FFI - Switch to strict mode aligned with go-jsonnet: reject unpadded base64 input (e.g. "YQ" without "=="). java.util.Base64 is lenient, so JVM/JS add explicit length check for ASCII input, matching go-jsonnet's len(str) % 4 != 0 check (builtins.go:1467) - Restore Val.ByteArr: compact byte-backed array for base64DecodeBytes - Restore Val.RangeArr subclass from flag-based _isRange - Restore Val.Str._asciiSafe + renderAsciiSafeString - Restore Materializer/ByteRenderer fast paths for ByteArr - Add comprehensive test suite (56+ Scala tests + 4 Jsonnet golden tests) Result: Beats jrsonnet on DecodeBytes benchmarks (1.47x faster). Overall 15-38% faster than master on base64 workloads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rollforward of #749 (reverted by #777) with the buggy hand-written C SIMD replaced by the battle-tested aklomp/base64 library (BSD-2-Clause). Also restores the non-SIMD optimizations from #749 (ByteArr, RangeArr subclass, asciiSafe rendering) and adds strict RFC 4648 padding validation aligned with go-jsonnet.
How the SIMD bug was fixed
PR #749's hand-written C SIMD code had incorrect x86 implementation (the reason for the revert in #777). Instead of fixing the hand-written code, this PR replaces it entirely with aklomp/base64 — a well-tested C library that handles SIMD dispatch correctly on all architectures:
The library is built as a static library via CMake and linked via
nativeLinkingOptions. No hand-written SIMD code remains.Strict mode aligned with go-jsonnet
Switched base64 decoding to strict RFC 4648 mode — unpadded input (e.g.
"YQ"instead of"YQ==") is now rejected on all platforms, matching go-jsonnet behavior:len(str) % 4 != 0check beforebase64.StdEncoding.DecodeString(builtins.go:1467)std.length(str) % 4 != 0check in stdlibjava.util.Base64was lenient, accepting unpadded input — a pre-existing behavioral divergenceChanges
PlatformBase64 abstraction — Platform-specific base64 implementations:
java.util.Base64+ strict padding pre-checkVal.ByteArr — Compact byte-backed array for
base64DecodeBytes. StoresArray[Byte]directly instead of NVal.Numwrappers (80%+ memory savings). Zero-copyrawBytesaccess for re-encoding.Val.RangeArr subclass — Extracted from flag-based
_isRangeinArrto reduce per-Arr memory footprint. O(1) creation forstd.range.Val.Str._asciiSafe + renderAsciiSafeString — Marks strings that need no JSON escaping (e.g. base64 output). Renderer skips SWAR escape scanning, writing bytes directly.
Materializer/ByteRenderer fast paths — Direct byte iteration for ByteArr, skipping per-element type dispatch.
Comprehensive test suite — 56+ Scala unit tests + 4 Jsonnet golden file tests covering RFC 4648 vectors, SIMD boundary sizes, bidirectional verification, strict padding enforcement, all 256 byte values, and error handling.
Benchmark Results — Scala Native vs jrsonnet (Rust)
Machine: Apple Silicon (AArch64/NEON), macOS. Tool:
hyperfine --warmup 3 --runs 10 -N.Both
masterandsimd-fullbinaries built from the same upstream/master base (4123ac3). The only difference is this PR's changes.SIMD base64 throughput (large payloads)
Larger payloads isolate base64 codec performance from Jsonnet interpreter overhead. The improvement scales with data size:
User CPU time (excluding process overhead) tells the same story:
ByteArr compact storage (DecodeBytes / byte_array)
sjsonnet's
ByteArrstores decoded bytes asArray[Byte]directly (vs NVal.Numwrappers), beating jrsonnet (Rust) on byte-oriented operations:Small payload benchmarks (interpreter-dominated)
These benchmarks process ~3KB payloads. Base64 codec time is negligible compared to process startup (~3ms) and Jsonnet parsing/evaluation, so codec improvements don't show here:
Test plan
./mill 'sjsonnet.jvm[3.3.7]'.test— 61 tests pass (including 56 Base64Tests with strict padding)./mill 'sjsonnet.js[3.3.7]'.test— 455 tests pass./mill 'sjsonnet.native[3.3.7]'.test— 476 tests pass./mill __.checkFormat— scalafmt passesCloses #777