feat: add disableCache flag to ApiServerSource adapter#9036
feat: add disableCache flag to ApiServerSource adapter#9036Ankitsinghsisodya wants to merge 6 commits intoknative:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Ankitsinghsisodya The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @Ankitsinghsisodya. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
Adds an opt-in disableCache flag to ApiServerSource so the adapter can skip the initial LIST phase and rely on WATCH-only startup, reducing API server load in large clusters (e.g., many namespaces).
Changes:
- Introduces
spec.disableCache(API + CRD schema) and propagates it through reconciler/receive-adapter config into the adapter. - Implements a watch-only startup path (
startWatchOnly) in the apiserver adapter whenDisableCache=true. - Adds unit tests for config serialization and to verify the no-cache path does not perform LIST calls; bumps Slack notify GitHub Action versions in workflows.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/apis/sources/v1/apiserver_types.go | Adds DisableCache field to ApiServerSourceSpec with documentation. |
| config/core/resources/apiserversource.yaml | Exposes disableCache in the CRD OpenAPI schema. |
| pkg/reconciler/apiserversource/apiserversource.go | Passes src.Spec.DisableCache into receive-adapter args. |
| pkg/reconciler/apiserversource/resources/receive_adapter.go | Adds DisableCache to adapter args and serializes into K_SOURCE_CONFIG. |
| pkg/reconciler/apiserversource/resources/receive_adapter_test.go | Tests K_SOURCE_CONFIG serialization includes/omits disableCache. |
| pkg/adapter/apiserver/config.go | Adds DisableCache to adapter runtime config. |
| pkg/adapter/apiserver/adapter.go | Adds startWatchOnly and routes to it when DisableCache=true. |
| pkg/adapter/apiserver/adapter_test.go | Adds adapter unit tests for no-cache startup and LIST-skipping. |
| .github/workflows/weekly-office-hours-slack-reminder.yaml | Updates Slack notify action version. |
| .github/workflows/kind-e2e.yaml | Updates Slack notify action version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9036 +/- ##
========================================
Coverage 50.51% 50.51%
========================================
Files 409 409
Lines 27505 27635 +130
========================================
+ Hits 13893 13961 +68
- Misses 12752 12803 +51
- Partials 860 871 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Address review feedback on knative#9036: - Fix Watch(rv="") replay bug: perform a lightweight LIST(limit=1) before each Watch to obtain the current resourceVersion, so Watch starts from that point and does not replay synthetic ADDED events for pre-existing objects (which is what rv="" triggers per the Kubernetes API contract). - Handle watch.Error 410 Gone: detect StatusReasonGone in drainWatchEvents and return so watchResourceLoop re-lists for a fresh resourceVersion, breaking the infinite tight-retry loop that would otherwise occur. - Replace hardcoded 5s retry with exponential backoff+jitter (wait.Backoff, 1s→60s cap) via watchResourceLoop, preventing thundering herd on reconnect. - Log warning when both DisableCache and FailFast are set, documenting that DisableCache takes precedence. - Replace TestAdapter_DisableCacheSkipsList (wrong: asserted no LIST was called) with TestAdapter_DisableCacheLightweightList (correct: asserts LIST is called with Limit=1). Add TestAdapter_DisableCacheEventDelivery which injects a watch event via a fake watcher and asserts a CloudEvent is delivered to the sink client.
Adds a disableCache field to ApiServerSourceSpec that, when true, causes the adapter to skip the initial LIST call and only watch for new events. This reduces API server load from O(resources*namespaces) LIST requests to O(resources) Watch connections — critical for clusters with 1000+ namespaces (resolves knative#8642). The flag is propagated from spec → ReceiveAdapterArgs → K_SOURCE_CONFIG JSON → adapter Config. A new startWatchOnly mode handles the watch-only loop with reconnect on error.
Address review feedback on knative#9036: - Fix Watch(rv="") replay bug: perform a lightweight LIST(limit=1) before each Watch to obtain the current resourceVersion, so Watch starts from that point and does not replay synthetic ADDED events for pre-existing objects (which is what rv="" triggers per the Kubernetes API contract). - Handle watch.Error 410 Gone: detect StatusReasonGone in drainWatchEvents and return so watchResourceLoop re-lists for a fresh resourceVersion, breaking the infinite tight-retry loop that would otherwise occur. - Replace hardcoded 5s retry with exponential backoff+jitter (wait.Backoff, 1s→60s cap) via watchResourceLoop, preventing thundering herd on reconnect. - Log warning when both DisableCache and FailFast are set, documenting that DisableCache takes precedence. - Replace TestAdapter_DisableCacheSkipsList (wrong: asserted no LIST was called) with TestAdapter_DisableCacheLightweightList (correct: asserts LIST is called with Limit=1). Add TestAdapter_DisableCacheEventDelivery which injects a watch event via a fake watcher and asserts a CloudEvent is delivered to the sink client.
…ntation - Remove DisableCache from stable v1 spec; move to annotation-driven pattern matching FailFast (features.knative.dev/apiserversource-disable-cache) - Export DisableCacheAnnotation and SkipPermissionsAnnotation as typed constants in apiserver_types.go; reconciler uses them - Validate at admission: reject when both disable-cache and skip-permissions annotations are set simultaneously - Fix goroutine leak in startWatchOnly: add sync.WaitGroup + unified watchCtx derived from both ctx and stopCh signals (S2/M5) - Log errors from delegate.Add/Update/Delete instead of silently discarding with _ (S3) - Add TimeoutSeconds=5min to all Watch calls to force periodic reconnection on stale streams (M1) - Replace time.After with time.NewTimer+Stop to eliminate timer leak in watchResourceLoop backoff select (M3)
- Remove source.EnvKlogVerbosity from receive_adapter_test.go; constant belongs to feat/klog-verbosity and was accidentally merged via stash - Remove trailing blank line in ApiServerSourceSpec struct body to satisfy goimports check
2e3e22c to
a9f7673
Compare
Backoff was reset on every successful LIST and Watch creation, so LIST-ok -> Watch-fail could hammer the API server at 1 req/s forever without ever reaching the 60s cap. Now resets only when a watch survives >30s; short-lived drains incur a backoff step so normal watch-close does not spin hot. Test assertion errors.Is(err, err) was always true; replaced with checks that error contains "mutually exclusive" and path "metadata.annotations".
- Remove DisableCache/SkipPermissions mutual exclusion — semantically orthogonal; dead branch in startWatchOnly becomes live again - Guard rv=="" after LIST to prevent watch cache replay on empty resourceVersion - Replace package-level noCacheWatchTimeout var with call-site local to eliminate cross-goroutine mutation footgun - Fix flaky tests: remove time.Sleep in favour of polling loops and pre-buffered fake watcher events
|
@creydr can you please run /retest? I think these these test cases are flaky. |
|
/ok-to-test |
|
@creydr can you we again /restest the failed test cases. |
|
/retest |
Fixes #8642
Problem
ApiServerSource watching 5 resources across 1000+ namespaces triggers ~5000 LIST calls on startup, causing client-side throttling:
Root cause: the adapter's reflector always does LIST+WATCH per namespace per resource. With 1000+ namespaces that's O(resources × namespaces) LIST requests.
Solution
Add a
disableCachefield toApiServerSourceSpec. Whentrue, the adapter skips the initial LIST call entirely and only opens Watch connections — reducing startup API calls from ~5000 to ~5 (one Watch per resource type).Semantic trade-off: pre-existing objects do not emit events on adapter startup. Only events for objects created/modified/deleted after the adapter starts are emitted.
Changes
pkg/apis/sources/v1/apiserver_types.goDisableCache boolfield onApiServerSourceSpecpkg/adapter/apiserver/config.goDisableCache boolin adapterConfigpkg/adapter/apiserver/adapter.gostartWatchOnly()— Watch-only loop with reconnect; branch before FailFast/Resilientpkg/reconciler/apiserversource/apiserversource.gosrc.Spec.DisableCacheto adapter argspkg/reconciler/apiserversource/resources/receive_adapter.goDisableCacheinReceiveAdapterArgs, serialized intoK_SOURCE_CONFIGconfig/core/resources/apiserversource.yamldisableCache: booleanin CRD openAPIV3Schema*_test.goUsage
Test plan
go test ./pkg/adapter/apiserver/...— passes, includesTestAdapter_DisableCacheandTestAdapter_DisableCacheSkipsListgo test ./pkg/reconciler/apiserversource/...— passes, includesTestMakeReceiveAdapterWithDisableCache