Commit 4d521a8
authored
perf: optimize string comparison fast path and array flatten (#768)
## Motivation
The `realistic2` benchmark (105 lines, complex workload with format
strings, cross-product comparisons, and array flattening) is one of our
biggest performance gaps vs jrsonnet. This PR targets two hot paths
identified in the benchmark.
## Key Design Decision
1. **String comparison**: In `compareStringsByCodepoint`, check
character equality first before performing surrogate checks. For strings
with long shared prefixes (common in the realistic2 benchmark where
generated names differ only in suffixes), this skips two
`Character.isSurrogate()` calls per matching position.
2. **Array flatten**: When `std.join([], arrays)` is used to flatten
arrays (as in realistic2), pre-compute the total size and use
`System.arraycopy` for bulk transfer instead of incremental
`ArrayBuilder` growth.
## Modification
**Util.scala** — `compareStringsByCodepoint`:
- Check `c1 == c2` first → skip surrogate checks (equal chars produce
equal codepoints regardless of surrogate status; pairs are compared
char-by-char)
- For non-surrogate differences, use `c1 - c2` (direct subtraction)
instead of `Character.compare`
**StringModule.scala** — `Join` (array separator path):
- Detect empty separator (`sepArr.length == 0`) for flatten fast path
- First pass: count total elements across all sub-arrays
- Allocate exact-sized `Array[Eval]` once
- Second pass: `System.arraycopy` from each sub-array into result
- Convert `for` loop to `while` loop for non-empty separator path
## Benchmark Results
### JMH (JVM, single fork)
| Benchmark | Before (ms/op) | After (ms/op) | Change |
|-----------|----------------|---------------|--------|
| realistic2 | 61.774 | 54.572 | **-11.7%** ✅ |
| comparison | 16.204 | 15.982 | -1.4% |
| setUnion | 0.638 | 0.593 | -7.1% |
| gen_big_object | 1.122 | 0.934 | -16.8% |
| reverse | 7.033 | 6.706 | -4.7% |
No regressions across 35 benchmarks.
### Hyperfine (Scala Native vs jrsonnet)
| Benchmark | sjsonnet (ms) | jrsonnet (ms) | Ratio |
|-----------|--------------|--------------|-------|
| realistic2 | 155.0 ± 2.1 | 100.6 ± 1.9 | 1.54x (was 1.61x) |
| comparison | 16.9 ± 0.9 | 12.4 ± 1.0 | 1.36x (unchanged) |
## Analysis
The `realistic2` benchmark generates ~63,500 objects using cross-product
comprehensions where `p != q` requires string comparisons. Most
generated strings share long prefixes (e.g.
`AAAAAAA...xxxxxxxBBBBBBBlocation...`), making the c1==c2 fast path very
effective — it skips surrogate checks for 90%+ of character positions.
The array flatten optimization benefits the `std.join([], [...])` calls
that concatenate 25 arrays of 50-2450 elements each. Pre-sizing
eliminates ~5 ArrayBuilder resize-and-copy cycles.
## References
- jit branch commit: `af4832f2` (compareStringsByCodepoint optimization)
- Benchmark: `bench/resources/cpp_suite/realistic2.jsonnet`
## Result
✅ All 420 tests pass across all platforms and Scala versions.
✅ JMH: 11.7% improvement on realistic2, no regressions.
✅ Hyperfine: realistic2 gap reduced from 1.61x to 1.54x vs jrsonnet.
---
## JMH Benchmark Results (vs master 0d13274)
| Benchmark | Master (ms/op) | This PR (ms/op) | Change |
|-----------|---------------:|----------------:|-------:|
| assertions | 0.207 | 0.209 | +1.0% |
| improved base64 | 0.156 | 0.152 | -2.6% |
| improved base64Decode | 0.123 | 0.116 | -5.7% |
| regressed base64DecodeBytes | 5.899 | 6.215 | +5.4% |
| base64_byte_array | 0.803 | 0.788 | -1.9% |
| bench.01 | 0.052 | 0.052 | +0.0% |
| bench.02 | 35.401 | 35.695 | +0.8% |
| regressed bench.03 | 9.583 | 10.129 | +5.7% |
| improved bench.04 | 0.122 | 0.119 | -2.5% |
| bench.06 | 0.224 | 0.221 | -1.3% |
| improved bench.07 | 3.332 | 3.183 | -4.5% |
| regressed bench.08 | 0.038 | 0.039 | +2.6% |
| regressed bench.09 | 0.041 | 0.044 | +7.3% |
| comparison | 0.028 | 0.028 | +0.0% |
| comparison2 | 18.681 | 18.590 | -0.5% |
| improved escapeStringJson | 0.032 | 0.031 | -3.1% |
| regressed foldl | 0.077 | 0.082 | +6.5% |
| gen_big_object | 0.918 | 0.908 | -1.1% |
| large_string_join | 0.555 | 0.551 | -0.7% |
| large_string_template | 1.600 | 1.609 | +0.6% |
| regressed lstripChars | 0.113 | 0.116 | +2.7% |
| manifestJsonEx | 0.052 | 0.052 | +0.0% |
| manifestTomlEx | 0.069 | 0.070 | +1.4% |
| regressed manifestYamlDoc | 0.055 | 0.057 | +3.6% |
| regressed member | 0.656 | 0.684 | +4.3% |
| regressed parseInt | 0.032 | 0.033 | +3.1% |
| realistic1 | 1.661 | 1.666 | +0.3% |
| realistic2 | 57.541 | 57.650 | +0.2% |
| reverse | 6.717 | 6.707 | -0.1% |
| improved rstripChars | 0.119 | 0.116 | -2.5% |
| setDiff | 0.431 | 0.423 | -1.9% |
| setInter | 0.371 | 0.369 | -0.5% |
| setUnion | 0.604 | 0.598 | -1.0% |
| stripChars | 0.117 | 0.117 | +0.0% |
| regressed substr | 0.057 | 0.059 | +3.5% |
**Summary**: 6 improvements, 10 regressions, 19 neutral
**Platform**: Apple Silicon, JMH single-shot avg1 parent bd09b2b commit 4d521a8
1 file changed
Lines changed: 7 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
153 | 151 | | |
154 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
| |||
0 commit comments