feat(queuev2): add InMemQueue, cache config, and task key helpers by arzonus · Pull Request #7954 · cadence-workflow/cadence

arzonus · 2026-04-15T09:24:33Z

What changed?

Adds the foundational building blocks for the cached timer queue reader (relates to #7953):

InMemQueue — a sorted in-memory task store backed by binary-search insertion, with PutTasks, GetTasks, LookAHead, LTrim, RTrimBySize, and Clear. Deduplicates on insert by task key.
Nine new dynamic config properties for the cache (TimerProcessorCacheMaxSize, CacheMaxLookAheadWindow, CachePrefetchTriggerWindow, CacheWarmupGracePeriod, CacheEvictionSafeWindow, CacheMinPrefetchInterval, CacheTimeEvictionInterval, CachedQueueReaderMode, EnableCachedScheduledQueue). All are global (no shard filter).
HistoryTaskKey.Less and HistoryTaskKey.Equal helpers used by InMemQueue for ordering and deduplication.
New metric definitions for cache hit/miss counters and a size histogram.

Why?

The timer queue reader currently issues a DB round-trip for every GetTask call. The goal is to eliminate those reads for tasks that fall within a pre-fetched look-ahead window. This PR lays the groundwork — the InMemQueue is the in-memory store the cached reader will use, and the config properties control its behaviour. Nothing is wired into production code yet.

How did you test it?

go test -race -count=1 ./service/history/queuev2/... ./service/history/config/... ./common/persistence/...
make pr GEN_DIR=service/history/queuev2
make build

All passed.

Potential risks

None — no changes to existing production code paths. All new types and config keys.

Release notes

N/A — internal change, not user-facing.

Documentation Changes

N/A

Signed-off-by: Seva Kaloshin <seva.kaloshin@gmail.com>

gitar-bot · 2026-04-15T14:15:31Z

CI failed: The `TestReplicationSimulation` integration test failed due to a `deadline-exceeded` error during a regional failover operation, likely caused by latency or synchronization issues introduced by the new `queuev2` components.

Overview

The TestReplicationSimulation integration test consistently failed during the regional failover simulation. This failure occurs when attempting to start a workflow immediately following a change to cluster attributes, suggesting a potential regression in request handling or service responsiveness in the new queuev2 implementation.

Failures

Replication Simulation Failure (confidence: medium)

Type: test
Affected jobs: 71469692572
Related to change: yes
Root cause: The system failed to handle a workflow start request following a failover, resulting in a deadline-exceeded error and RST_STREAM (CANCEL). This points to either a deadlock, increased latency, or incomplete state propagation within the InMemQueue or cache configuration components introduced in this PR.
Suggested fix: Investigate the interaction between InMemQueue and the replication control plane. Verify if the failover status propagation is being blocked by the new queue logic and consider increasing the timeout for simulation steps if the behavior is due to expected initialization overhead.

Summary

Change-related failures: 1, specifically within the replication integration test suite.
Infrastructure/flaky failures: 0.
Recommended action: Review the queuev2 integration points. Use local simulation logs to determine if the InMemQueue is correctly handling requests during cluster attribute transitions and ensure that no blocking calls were introduced in the request path.

Code Review ✅ Approved 3 resolved / 3 findings

InMemQueue implementation now safely handles zero-page-size requests, optimizes memory efficiency by truncating before collection, and prevents memory leaks by clearing interface references. All identified logic and memory management issues are resolved.

✅ 3 resolved

✅ Bug: GetTasks panics when PageSize is 0 and tasks match

📄 service/history/queuev2/queue_mem.go:139-141
In GetTasks, after the scan loop, when len(tasks) > req.PageSize is true (e.g. 1 > 0), tasks is truncated to tasks[:0] (empty slice). Then tasks[len(tasks)-1] evaluates to tasks[-1], causing an index-out-of-range panic.

While callers are expected to pass a positive PageSize, PageSize is a plain int field defaulting to 0. A defensive check avoids a production panic if a caller ever passes 0.

✅ Performance: GetTasks collects all matching tasks before truncating to PageSize

📄 service/history/queuev2/queue_mem.go:128-141
The scan loop (lines 129-136) appends every predicate-matching task to tasks across the entire [min, max) key range, then truncates to PageSize afterward. If the range contains many more matching tasks than PageSize, this unnecessarily allocates and fills a large slice.

The simpleQueueReader.GetTask in queue_reader.go delegates page-size limiting to the DB, so it doesn't have this issue. The in-memory variant should break early too.

✅ Performance: RTrimBySize retains GC references to trimmed interface values

📄 service/history/queuev2/queue_mem.go:198-201
RTrimBySize truncates the slice with q.tasks = q.tasks[:maxSize] (line 200), but since persistence.Task is an interface type, the backing array still holds references to the removed task objects beyond maxSize. These objects won't be garbage-collected until the backing array itself is reallocated.

Contrast with LTrim (line 179) which correctly copies into a fresh slice, releasing old references. For consistency and to avoid unbounded memory retention after repeated trims, RTrimBySize should nil out the tail elements or allocate a new slice.

Rules ✅ All requirements met

Repository Rules

✅ GitHub Issue Linking Requirement: The PR description contains a valid issue link (#7953) which references a cadence-workflow issue.

✅ PR Description Quality Standards: The PR description contains all required sections with substantive content, clearly explains the 'Why' for the changes, and includes specific test commands.

_{1 rule not applicable. Show all rules by commenting gitar display:verbose.}

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

arzonus · 2026-04-16T07:22:52Z

Closing in favour of a single consolidated draft PR. All changes are preserved in the feature branch.

arzonus requested review from Assem-Uber, Shaddoll, abhishekj720, adhityamamallan, bowenxia, c-warren, demirkayaender, dkrotx, eleonoradgr, fimanishi, gazi-yestemirova, jakobht, macrotim, natemort, neil-xie, shijiesheng and timl3136 as code owners April 15, 2026 09:24

arzonus mentioned this pull request Apr 15, 2026

feat(queuev2): add CachedQueueReader with prefetch and time-based eviction #7955

Closed

gitar-bot Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread service/history/queuev2/queue_mem.go

gitar-bot Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread service/history/queuev2/queue_mem.go

arzonus force-pushed the queuev2-foundation branch from 45aa0a2 to ae626f8 Compare April 15, 2026 09:44

gitar-bot Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread service/history/queuev2/queue_mem.go

arzonus force-pushed the queuev2-foundation branch 6 times, most recently from a3bb473 to ab9fe3a Compare April 15, 2026 14:00

feat(queuev2): add InMemQueue, cache config, and task key helpers

42af244

Signed-off-by: Seva Kaloshin <seva.kaloshin@gmail.com>

arzonus force-pushed the queuev2-foundation branch from ab9fe3a to 42af244 Compare April 15, 2026 14:14

arzonus closed this Apr 16, 2026

arzonus mentioned this pull request Apr 16, 2026

feat(queuev2): cached timer queue reader with prefetch and time-based eviction #7962

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(queuev2): add InMemQueue, cache config, and task key helpers#7954

feat(queuev2): add InMemQueue, cache config, and task key helpers#7954
arzonus wants to merge 1 commit intocadence-workflow:masterfrom
arzonus:queuev2-foundation

arzonus commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot Bot commented Apr 15, 2026 •

edited

Loading

Overview

Failures

Replication Simulation Failure (confidence: medium)

Summary

Repository Rules

Uh oh!

arzonus commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arzonus commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Failures

Replication Simulation Failure (confidence: medium)

Summary

Repository Rules

Uh oh!

arzonus commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gitar-bot Bot commented Apr 15, 2026 •

edited

Loading