feat(simulation): add cached scheduled queue simulation scenarios#7957
feat(simulation): add cached scheduled queue simulation scenarios#7957arzonus wants to merge 4 commits intocadence-workflow:masterfrom
Conversation
2af9911 to
4bbc9c4
Compare
4bbc9c4 to
78ae8f7
Compare
78ae8f7 to
d57a7b3
Compare
| func (q *cachedQueueReader) updateExclusiveUpperBound(key persistence.HistoryTaskKey) { | ||
| q.logger.Debug("upper bound advancing", | ||
| tag.Dynamic("prevUpperBound", q.exclusiveUpperBound), | ||
| tag.Dynamic("newUpperBound", key), | ||
| ) | ||
| q.exclusiveUpperBound = key | ||
| q.notifyPrefetch() | ||
| } |
There was a problem hiding this comment.
💡 Bug: notifyPrefetch called while holding q.mu may delay prefetch
updateExclusiveUpperBound (line 242) sends on q.prefetchCh via notifyPrefetch. The comment on line 241 says 'Caller must hold q.mu', and indeed it is called from prefetch() and putTasks() under q.mu.Lock(). In prefetchLoop, when prefetchCh fires (line 191), nextPrefetchDelay() tries to acquire q.mu.RLock() (line 255). Because Go's sync.RWMutex is writer-preferring, if any other goroutine is waiting for q.mu.Lock(), the RLock in nextPrefetchDelay will block. This isn't a deadlock (the lock holder is the sender, not the receiver), but it's worth noting that notifyPrefetch is non-blocking (buffered channel, line 234-237), so the signal is just queued for the next loop iteration. No functional bug, but the pattern is subtle and a comment would help.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
d508d52 to
0bc2c3b
Compare
Signed-off-by: Seva Kaloshin <seva.kaloshin@gmail.com>
…ction Signed-off-by: Seva Kaloshin <seva.kaloshin@gmail.com>
Signed-off-by: Seva Kaloshin <seva.kaloshin@gmail.com>
Signed-off-by: Seva Kaloshin <seva.kaloshin@gmail.com>
0bc2c3b to
48ebf0a
Compare
CI failed: Integration tests are failing due to race conditions in the global ratelimiter's background update loop, which logs warnings after test teardown.OverviewMultiple integration test suites are failing due to 'logged too late' errors in the FailuresGlobal Ratelimiter Background Log Mismatch (confidence: high)
Summary
Code Review 👍 Approved with suggestions 5 resolved / 6 findingsImplements cached scheduled queue simulation scenarios and resolves performance issues related to scan efficiency, cache hit boundaries, and warmup synchronization. Ensure 💡 Bug: notifyPrefetch called while holding q.mu may delay prefetch📄 service/history/queuev2/queue_reader_cached.go:242-249 📄 service/history/queuev2/queue_reader_cached.go:254-268
✅ 5 resolved✅ Performance: InMemQueue.GetTasks scans all matching tasks before truncating
✅ Bug: LookAHead cache hit ignores inclusiveLowerBound, may skip tasks
✅ Edge Case: Inject proceeds before Start if called early (warmup bypassed)
✅ Edge Case: Inject bypasses warmup when called before Start()
✅ Performance: InMemQueue.PutTasks is O(n*m) for unsorted batch inserts
🤖 Prompt for agentsRules ✅ All requirements metRepository Rules
1 rule not applicable. Show all rules by commenting Tip Comment OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
Closing in favour of a single consolidated draft PR. All changes are preserved in the feature branch. |
What changed?
Adds two end-to-end simulation scenarios for the cached timer queue reader (relates to #7953).
Part of a stack — merge after PR #7956. Requires the simulation infra from PR #7952 (
NumWorkflowSleeps,SleepBetweenWorkflowStarts, configurable timeout) to be merged first.Why?
The cached scheduled queue needs end-to-end validation in a real cluster before being enabled in production. Simulation tests spin up a local Docker cluster and run real workflows, which catches integration issues that unit tests cannot (e.g. task loss, double-processing, cache/DB divergence under load).
Two scenarios cover the two planned rollout stages:
queuev2_scheduled_cache_shadow— runs 1 000 workflows with 10 timer sleeps each under shadow mode. The cache is populated and compared against DB reads, but correctness is unaffected. Mismatches surface as warnings before enabling the cache.queuev2_scheduled_cache_enabled— same workload under enabled mode. Timer task reads are served from the in-memory cache where the look-ahead window covers the task.How did you test it?
Both passed. All created transfer tasks were executed. Unexecuted timer tasks (types 3 and 4 — next-day retention timers) are expected and are not a failure condition.
Potential risks
None — changes are confined to simulation test infrastructure and YAML config files. No production code is affected.
Release notes
N/A — simulation test infrastructure only.
Documentation Changes
N/A