Skip to content

UPSTREAM PR #26907: upb: add ASLR-based seed to integer hash function#142

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-26907-fix-upb-inthash-seed
Open

UPSTREAM PR #26907: upb: add ASLR-based seed to integer hash function#142
loci-dev wants to merge 1 commit intomainfrom
loci/pr-26907-fix-upb-inthash-seed

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: protocolbuffers/protobuf#26907

Summary

The integer hash function upb_inthash() in upb/hash/common.c is completely deterministic — it uses no seed or randomization. In contrast, the string hash function already uses an ASLR-based seed (_upb_seed) via Wyhash.

This asymmetry allows an attacker to trivially precompute integer keys that all hash to the same bucket in upb_inttable, causing O(N²) insertion time for N entries. Any protobuf message with a map<int32, ...>, map<int64, ...>, map<uint32, ...>, or map<uint64, ...> field parsed from untrusted input is affected.

Impact

All UPB-backed protobuf runtimes are affected: Python, Ruby, PHP, and Rust.

Empirical measurements (Python 3.13, protobuf 6.33.6, x86-64):

Map entries Colliding keys Normal keys Ratio
1,000 0.001s 0.000s 13×
5,000 0.023s 0.000s 57×
10,000 0.097s 0.001s 164×
50,000 2.3s 0.002s 1,078×

A ~500 KB protobuf message with 50,000 colliding keys takes over 1,000× longer to parse than the same message with non-colliding keys. The scaling is quadratic — 200,000 entries would take ~37 seconds.

Collision construction

For map<int32, ...> on 64-bit, the hash is just (uint32_t)key (high 32 bits are zero). The bucket is hash & mask where mask = table_size - 1. Keys that are multiples of the final table size (a power of 2) all hash to bucket 0 at every intermediate table size:

step = next_power_of_2(n_entries)
colliding_keys = [i * step for i in range(n_entries)]

Prior art

  • String-keyed maps already use an ASLR-based seed via _upb_seed (line 448, 455)
  • The string hash seed was briefly reverted to a hardcoded constant (commit 6bde8c417, Feb 2025) due to Ruby test flakiness, then restored (commit 8ef81fbd9)
  • C++ protobuf uses absl::Hash with per-allocation randomized salt — not affected
  • Java uses LinkedHashMap with tree fallback at 8 collisions — partially mitigated

Fix

Added a separate ASLR-based seed variable (_upb_int_seed) and XOR it into the key before hashing, matching the approach already used for string-keyed maps. This makes the hash function non-deterministic across process invocations when ASLR is enabled.

Files changed

  • upb/hash/common.c — source
  • ruby/ext/google/protobuf_c/ruby-upb.c — Ruby amalgamation
  • php/ext/google/protobuf/php-upb.c — PHP amalgamation

The integer hash function `upb_inthash()` is completely deterministic
(no seed), unlike the string hash function which uses an ASLR-based seed
via `_upb_seed`. This allows an attacker to trivially precompute keys
that all hash to the same bucket, causing O(N^2) insertion time for N
map entries during protobuf parsing.

For example, a ~500KB protobuf message with 50,000 colliding int32 map
keys takes ~1000x longer to parse than the same number of non-colliding
keys. This affects all UPB-backed runtimes: Python, Ruby, PHP, and Rust.

The fix adds a separate ASLR-based seed (`_upb_int_seed`) that is XORed
into the key before hashing, matching the approach already used for
string-keyed maps. This makes the hash function non-deterministic across
process invocations when ASLR is enabled.
@loci-review
Copy link
Copy Markdown

loci-review Bot commented Apr 15, 2026

Overview

Analysis of 10,167 functions in build.protoc-stable shows 16 modified functions (0.16%) implementing ASLR-based hash seeding for security hardening. Power consumption increased by 0.07% (+410.11 nJ: 587,492.36 nJ → 587,902.47 nJ).

Function Analysis

Commit 0954699: "upb: add ASLR-based seed to integer hash function" — adds key ^= (uintptr_t)_upb_IntSeed() to prevent hash collision DoS attacks.

Most Impacted Functions:

  • common.c_inthash: Response time +140.4% (+5.46ns: 3.89ns → 9.36ns). Core hash function now XORs keys with ASLR-derived seed, adding 2 address-load instructions and 1 XOR operation. Affects all integer hash operations.

  • common.c_insert.constprop.0: Response time +10.5% (+13.28ns: 126.91ns → 140.19ns). Hash table insertion helper adds ASLR seed logic, increasing hash computation block time by 82%.

  • upb_inttable_insert: Response time +3.35% (+30.74ns: 918.91ns → 949.65ns). Calls insertion helper twice, amplifying overhead. Throughput time +1.27% (+4.18ns).

Functions Showing Improvements:

  • upb_exttable_remove: Response time -4.46% (-9.90ns: 222.36ns → 212.46ns). Compiler optimizations compensated for security overhead.

  • upb_inttable_remove: Response time -3.49% (-6.87ns: 196.96ns → 190.09ns). Entry block improved 16.5% through better register allocation.

  • upb_exttable_insert: Throughput time -3.96% (-19.34ns: 488.38ns → 469.04ns). Hash computation block optimized by 5.1%.

Other analyzed functions (lookup, replace, string operations) showed 4-7ns overhead, consistent with ASLR seed XOR operations.

Assessment: Changes are intentional security enhancements. The absolute overhead (4-30ns per operation) is minimal compared to the security benefit of preventing O(N²) hash collision attacks. Several functions demonstrate net improvements, indicating effective compiler optimization. All modified functions are in μpb's hash table infrastructure, affecting Ruby, PHP, and Python protobuf bindings.

💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from fa3f834 to 96f6b2a Compare April 21, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants