UPSTREAM PR #26907: upb: add ASLR-based seed to integer hash function#142
UPSTREAM PR #26907: upb: add ASLR-based seed to integer hash function#142
Conversation
The integer hash function `upb_inthash()` is completely deterministic (no seed), unlike the string hash function which uses an ASLR-based seed via `_upb_seed`. This allows an attacker to trivially precompute keys that all hash to the same bucket, causing O(N^2) insertion time for N map entries during protobuf parsing. For example, a ~500KB protobuf message with 50,000 colliding int32 map keys takes ~1000x longer to parse than the same number of non-colliding keys. This affects all UPB-backed runtimes: Python, Ruby, PHP, and Rust. The fix adds a separate ASLR-based seed (`_upb_int_seed`) that is XORed into the key before hashing, matching the approach already used for string-keyed maps. This makes the hash function non-deterministic across process invocations when ASLR is enabled.
OverviewAnalysis of 10,167 functions in build.protoc-stable shows 16 modified functions (0.16%) implementing ASLR-based hash seeding for security hardening. Power consumption increased by 0.07% (+410.11 nJ: 587,492.36 nJ → 587,902.47 nJ). Function AnalysisCommit 0954699: "upb: add ASLR-based seed to integer hash function" — adds Most Impacted Functions:
Functions Showing Improvements:
Other analyzed functions (lookup, replace, string operations) showed 4-7ns overhead, consistent with ASLR seed XOR operations. Assessment: Changes are intentional security enhancements. The absolute overhead (4-30ns per operation) is minimal compared to the security benefit of preventing O(N²) hash collision attacks. Several functions demonstrate net improvements, indicating effective compiler optimization. All modified functions are in μpb's hash table infrastructure, affecting Ruby, PHP, and Python protobuf bindings. 💬 Questions? Tag @loci-dev |
fa3f834 to
96f6b2a
Compare
d1d44f5 to
f292971
Compare
Note
Source pull request: protocolbuffers/protobuf#26907
Summary
The integer hash function
upb_inthash()inupb/hash/common.cis completely deterministic — it uses no seed or randomization. In contrast, the string hash function already uses an ASLR-based seed (_upb_seed) via Wyhash.This asymmetry allows an attacker to trivially precompute integer keys that all hash to the same bucket in
upb_inttable, causing O(N²) insertion time for N entries. Any protobuf message with amap<int32, ...>,map<int64, ...>,map<uint32, ...>, ormap<uint64, ...>field parsed from untrusted input is affected.Impact
All UPB-backed protobuf runtimes are affected: Python, Ruby, PHP, and Rust.
Empirical measurements (Python 3.13, protobuf 6.33.6, x86-64):
A ~500 KB protobuf message with 50,000 colliding keys takes over 1,000× longer to parse than the same message with non-colliding keys. The scaling is quadratic — 200,000 entries would take ~37 seconds.
Collision construction
For
map<int32, ...>on 64-bit, the hash is just(uint32_t)key(high 32 bits are zero). The bucket ishash & maskwheremask = table_size - 1. Keys that are multiples of the final table size (a power of 2) all hash to bucket 0 at every intermediate table size:Prior art
_upb_seed(line 448, 455)6bde8c417, Feb 2025) due to Ruby test flakiness, then restored (commit8ef81fbd9)absl::Hashwith per-allocation randomized salt — not affectedLinkedHashMapwith tree fallback at 8 collisions — partially mitigatedFix
Added a separate ASLR-based seed variable (
_upb_int_seed) and XOR it into the key before hashing, matching the approach already used for string-keyed maps. This makes the hash function non-deterministic across process invocations when ASLR is enabled.Files changed
upb/hash/common.c— sourceruby/ext/google/protobuf_c/ruby-upb.c— Ruby amalgamationphp/ext/google/protobuf/php-upb.c— PHP amalgamation