Commit a17ec44
authored
perf: optimize sort allocation paths in std.sort and set operations (#752)
## Motivation
`std.sort` and set operations (`std.setDiff`, `std.setInter`,
`std.setUnion`) internally sort arrays but use allocation-heavy
patterns: `.map()` for forcing lazy values, `.map().sortBy()` creating
intermediate copies, and allocating a new `Array(1)` per key function
call.
## Key Design Decisions
- **Keep in-place Comparator sort for numerics** rather than primitive
double sort + reconstruction. While primitive `Arrays.sort(double[])` is
faster on JVM, the `Val.cachedNum` reconstruction step creates GC
pressure on Scala Native (measured 1.26x regression). In-place
Comparator sort avoids any reconstruction.
- **Reuse argument buffer** for key function calls: single
`Array[Val](1)` shared across all iterations instead of `Array(v.value)`
per element.
- **While-loops over `.map()`**: eliminates closure allocation, iterator
overhead, and intermediate array copies.
## Modifications
`sjsonnet/src/sjsonnet/stdlib/SetModule.scala`:
1. **Key function path**: Reuse single-element `argBuf` across all key
function calls, use while-loop for key computation
2. **Result construction**: Pre-allocated `Array[Eval]` with while-loop
instead of `sortedIndices.map(i => vs(i))`
3. **Strict force**: `while` loop with pre-allocated `Array[Val]`
instead of `vs.map(_.value)`
4. **String sort (no key)**: In-place `Arrays.sort` with Comparator
instead of `.map(_.cast[Val.Str]).sortBy(_.asString)` (2 intermediate
array copies)
5. **Array sort (no key)**: In-place `Arrays.sort` with Comparator
instead of `.map(_.cast[Val.Arr]).sortBy(identity)` (2 intermediate
copies)
## Benchmark Results
### JMH (JVM, single iteration)
| Benchmark | Before (ms/op) | After (ms/op) | Change |
|-----------|----------------|---------------|--------|
| bench.06 (sort) | 0.359 | 0.251 | **-30.1%** |
| setDiff | 0.533 | 0.446 | **-16.3%** |
| setInter | 0.367 | 0.386 | neutral |
| setUnion | 0.727 | 0.677 | **-6.9%** |
### Scala Native hyperfine (`-N --warmup 10`)
| Benchmark | Before (ms) | After (ms) | Speedup |
|-----------|-------------|------------|---------|
| bench.06 (sort, 30 runs) | 7.6 ± 0.4 | 5.5 ± 0.2 | **1.39x** |
| setDiff (20 runs) | 8.8 ± 0.6 | 7.7 ± 0.6 | **1.13x** |
| setInter (20 runs) | 8.6 ± 1.3 | 8.3 ± 0.8 | neutral |
## Analysis
The allocation reduction benefits both JVM and Native, but the impact is
more pronounced on Native where GC overhead is higher. The sort
benchmark sees the largest improvement because it exercises all the
optimized paths (force + sort + result construction). Set operations see
moderate improvement since the merge-based intersection/difference only
calls sort once on already-sorted inputs.
## References
- Upstream exploration: he-pin/sjsonnet jit branch
[`b1f64df0`](He-Pin@b1f64df0)
## Result
Sort and set operations are faster on both JVM and Scala Native with
zero semantic changes. All existing tests pass.1 parent b35cde7 commit a17ec44
1 file changed
Lines changed: 41 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
96 | | - | |
97 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
98 | 105 | | |
99 | 106 | | |
100 | 107 | | |
| |||
122 | 129 | | |
123 | 130 | | |
124 | 131 | | |
125 | | - | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
126 | 140 | | |
127 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
128 | 148 | | |
129 | 149 | | |
130 | 150 | | |
131 | 151 | | |
132 | 152 | | |
133 | 153 | | |
134 | 154 | | |
| 155 | + | |
135 | 156 | | |
136 | | - | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
137 | 165 | | |
138 | | - | |
139 | | - | |
| 166 | + | |
| 167 | + | |
140 | 168 | | |
141 | 169 | | |
142 | 170 | | |
| |||
146 | 174 | | |
147 | 175 | | |
148 | 176 | | |
149 | | - | |
150 | 177 | | |
151 | | - | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
152 | 182 | | |
153 | 183 | | |
154 | 184 | | |
155 | 185 | | |
156 | 186 | | |
| 187 | + | |
157 | 188 | | |
158 | 189 | | |
159 | 190 | | |
| |||
0 commit comments