Commit 7375578
committed
[CUDA] PagedAttention: early-return on empty query input (token_count == 0)
Found while verifying the MEA-path edge cases from the round-2 review:
token_count == 0 with non-zero past_seqlens would still enter backend
preprocessing — FA path's LaunchReshapeAndCache hits total_size = 0,
threads = min(0, max_threads) = 0, then blocks = (0 + 0 - 1) / 0
(division by zero). MEA path would also mis-report "total_kv_tokens
is zero for non-empty input" even though token_count == 0 is the
non-empty coordinate.
Move the empty-query check right after the cache-aliasing verification
(output is already [0, hidden_size] and the cache outputs alias the
inputs, so no backend work is needed). This protects both backends
with a single guard and removes the now-redundant nested check inside
the MEA block.1 parent f345865 commit 7375578
1 file changed
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
146 | 152 | | |
147 | 153 | | |
148 | 154 | | |
| |||
254 | 260 | | |
255 | 261 | | |
256 | 262 | | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | 263 | | |
264 | 264 | | |
265 | 265 | | |
| |||
0 commit comments