Skip to content

feat: add disableCache flag to ApiServerSource adapter#9036

Open
Ankitsinghsisodya wants to merge 6 commits intoknative:mainfrom
Ankitsinghsisodya:feat/apiserversource-disable-cache
Open

feat: add disableCache flag to ApiServerSource adapter#9036
Ankitsinghsisodya wants to merge 6 commits intoknative:mainfrom
Ankitsinghsisodya:feat/apiserversource-disable-cache

Conversation

@Ankitsinghsisodya
Copy link
Copy Markdown
Contributor

Fixes #8642

Problem

ApiServerSource watching 5 resources across 1000+ namespaces triggers ~5000 LIST calls on startup, causing client-side throttling:

Waited for 17m12s due to client-side throttling... GET .../taskruns?resourceVersion=0

Root cause: the adapter's reflector always does LIST+WATCH per namespace per resource. With 1000+ namespaces that's O(resources × namespaces) LIST requests.

Solution

Add a disableCache field to ApiServerSourceSpec. When true, the adapter skips the initial LIST call entirely and only opens Watch connections — reducing startup API calls from ~5000 to ~5 (one Watch per resource type).

Semantic trade-off: pre-existing objects do not emit events on adapter startup. Only events for objects created/modified/deleted after the adapter starts are emitted.

Changes

File Change
pkg/apis/sources/v1/apiserver_types.go DisableCache bool field on ApiServerSourceSpec
pkg/adapter/apiserver/config.go DisableCache bool in adapter Config
pkg/adapter/apiserver/adapter.go startWatchOnly() — Watch-only loop with reconnect; branch before FailFast/Resilient
pkg/reconciler/apiserversource/apiserversource.go Pass src.Spec.DisableCache to adapter args
pkg/reconciler/apiserversource/resources/receive_adapter.go DisableCache in ReceiveAdapterArgs, serialized into K_SOURCE_CONFIG
config/core/resources/apiserversource.yaml disableCache: boolean in CRD openAPIV3Schema
*_test.go Unit tests covering no-cache start path and verifying no LIST call is made

Usage

apiVersion: sources.knative.dev/v1
kind: ApiServerSource
metadata:
  name: tekton-taskrun-source
spec:
  disableCache: true
  resources:
    - apiVersion: tekton.dev/v1
      kind: TaskRun
  namespaceSelector: {}
  sink:
    ref:
      apiVersion: v1
      kind: Service
      name: my-sink

Test plan

  • go test ./pkg/adapter/apiserver/... — passes, includes TestAdapter_DisableCache and TestAdapter_DisableCacheSkipsList
  • go test ./pkg/reconciler/apiserversource/... — passes, includes TestMakeReceiveAdapterWithDisableCache
  • Integration test in a cluster with many namespaces (reproducer pending from issue author)

Copilot AI review requested due to automatic review settings April 21, 2026 19:16
@knative-prow knative-prow Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 21, 2026
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Ankitsinghsisodya
Once this PR has been reviewed and has the lgtm label, please assign pierdipi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 21, 2026
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented Apr 21, 2026

Hi @Ankitsinghsisodya. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in disableCache flag to ApiServerSource so the adapter can skip the initial LIST phase and rely on WATCH-only startup, reducing API server load in large clusters (e.g., many namespaces).

Changes:

  • Introduces spec.disableCache (API + CRD schema) and propagates it through reconciler/receive-adapter config into the adapter.
  • Implements a watch-only startup path (startWatchOnly) in the apiserver adapter when DisableCache=true.
  • Adds unit tests for config serialization and to verify the no-cache path does not perform LIST calls; bumps Slack notify GitHub Action versions in workflows.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/apis/sources/v1/apiserver_types.go Adds DisableCache field to ApiServerSourceSpec with documentation.
config/core/resources/apiserversource.yaml Exposes disableCache in the CRD OpenAPI schema.
pkg/reconciler/apiserversource/apiserversource.go Passes src.Spec.DisableCache into receive-adapter args.
pkg/reconciler/apiserversource/resources/receive_adapter.go Adds DisableCache to adapter args and serializes into K_SOURCE_CONFIG.
pkg/reconciler/apiserversource/resources/receive_adapter_test.go Tests K_SOURCE_CONFIG serialization includes/omits disableCache.
pkg/adapter/apiserver/config.go Adds DisableCache to adapter runtime config.
pkg/adapter/apiserver/adapter.go Adds startWatchOnly and routes to it when DisableCache=true.
pkg/adapter/apiserver/adapter_test.go Adds adapter unit tests for no-cache startup and LIST-skipping.
.github/workflows/weekly-office-hours-slack-reminder.yaml Updates Slack notify action version.
.github/workflows/kind-e2e.yaml Updates Slack notify action version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/adapter/apiserver/adapter.go Outdated
Comment thread pkg/adapter/apiserver/adapter.go Outdated
Comment thread pkg/adapter/apiserver/adapter_test.go Outdated
Comment thread pkg/apis/sources/v1/apiserver_types.go Outdated
Comment thread config/core/resources/apiserversource.yaml Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 54.54545% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.51%. Comparing base (d5f437f) to head (d8067fa).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
pkg/adapter/apiserver/adapter.go 53.12% 52 Missing and 8 partials ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #9036    +/-   ##
========================================
  Coverage   50.51%   50.51%            
========================================
  Files         409      409            
  Lines       27505    27635   +130     
========================================
+ Hits        13893    13961    +68     
- Misses      12752    12803    +51     
- Partials      860      871    +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Ankitsinghsisodya added a commit to Ankitsinghsisodya/eventing that referenced this pull request Apr 21, 2026
Address review feedback on knative#9036:

- Fix Watch(rv="") replay bug: perform a lightweight LIST(limit=1) before
  each Watch to obtain the current resourceVersion, so Watch starts from
  that point and does not replay synthetic ADDED events for pre-existing
  objects (which is what rv="" triggers per the Kubernetes API contract).

- Handle watch.Error 410 Gone: detect StatusReasonGone in drainWatchEvents
  and return so watchResourceLoop re-lists for a fresh resourceVersion,
  breaking the infinite tight-retry loop that would otherwise occur.

- Replace hardcoded 5s retry with exponential backoff+jitter (wait.Backoff,
  1s→60s cap) via watchResourceLoop, preventing thundering herd on reconnect.

- Log warning when both DisableCache and FailFast are set, documenting that
  DisableCache takes precedence.

- Replace TestAdapter_DisableCacheSkipsList (wrong: asserted no LIST was
  called) with TestAdapter_DisableCacheLightweightList (correct: asserts
  LIST is called with Limit=1). Add TestAdapter_DisableCacheEventDelivery
  which injects a watch event via a fake watcher and asserts a CloudEvent
  is delivered to the sink client.
@knative-prow knative-prow Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 21, 2026
Adds a disableCache field to ApiServerSourceSpec that, when true, causes
the adapter to skip the initial LIST call and only watch for new events.
This reduces API server load from O(resources*namespaces) LIST requests
to O(resources) Watch connections — critical for clusters with 1000+
namespaces (resolves knative#8642).

The flag is propagated from spec → ReceiveAdapterArgs → K_SOURCE_CONFIG
JSON → adapter Config. A new startWatchOnly mode handles the watch-only
loop with reconnect on error.
Address review feedback on knative#9036:

- Fix Watch(rv="") replay bug: perform a lightweight LIST(limit=1) before
  each Watch to obtain the current resourceVersion, so Watch starts from
  that point and does not replay synthetic ADDED events for pre-existing
  objects (which is what rv="" triggers per the Kubernetes API contract).

- Handle watch.Error 410 Gone: detect StatusReasonGone in drainWatchEvents
  and return so watchResourceLoop re-lists for a fresh resourceVersion,
  breaking the infinite tight-retry loop that would otherwise occur.

- Replace hardcoded 5s retry with exponential backoff+jitter (wait.Backoff,
  1s→60s cap) via watchResourceLoop, preventing thundering herd on reconnect.

- Log warning when both DisableCache and FailFast are set, documenting that
  DisableCache takes precedence.

- Replace TestAdapter_DisableCacheSkipsList (wrong: asserted no LIST was
  called) with TestAdapter_DisableCacheLightweightList (correct: asserts
  LIST is called with Limit=1). Add TestAdapter_DisableCacheEventDelivery
  which injects a watch event via a fake watcher and asserts a CloudEvent
  is delivered to the sink client.
…ntation

- Remove DisableCache from stable v1 spec; move to annotation-driven
  pattern matching FailFast (features.knative.dev/apiserversource-disable-cache)
- Export DisableCacheAnnotation and SkipPermissionsAnnotation as typed
  constants in apiserver_types.go; reconciler uses them
- Validate at admission: reject when both disable-cache and
  skip-permissions annotations are set simultaneously
- Fix goroutine leak in startWatchOnly: add sync.WaitGroup + unified
  watchCtx derived from both ctx and stopCh signals (S2/M5)
- Log errors from delegate.Add/Update/Delete instead of silently
  discarding with _ (S3)
- Add TimeoutSeconds=5min to all Watch calls to force periodic
  reconnection on stale streams (M1)
- Replace time.After with time.NewTimer+Stop to eliminate timer leak
  in watchResourceLoop backoff select (M3)
- Remove source.EnvKlogVerbosity from receive_adapter_test.go; constant
  belongs to feat/klog-verbosity and was accidentally merged via stash
- Remove trailing blank line in ApiServerSourceSpec struct body
  to satisfy goimports check
@Ankitsinghsisodya Ankitsinghsisodya force-pushed the feat/apiserversource-disable-cache branch from 2e3e22c to a9f7673 Compare April 21, 2026 21:19
Backoff was reset on every successful LIST and Watch creation, so
LIST-ok -> Watch-fail could hammer the API server at 1 req/s forever
without ever reaching the 60s cap. Now resets only when a watch
survives >30s; short-lived drains incur a backoff step so normal
watch-close does not spin hot.

Test assertion errors.Is(err, err) was always true; replaced with
checks that error contains "mutually exclusive" and path
"metadata.annotations".
- Remove DisableCache/SkipPermissions mutual exclusion — semantically
  orthogonal; dead branch in startWatchOnly becomes live again
- Guard rv=="" after LIST to prevent watch cache replay on empty
  resourceVersion
- Replace package-level noCacheWatchTimeout var with call-site local to
  eliminate cross-goroutine mutation footgun
- Fix flaky tests: remove time.Sleep in favour of polling loops and
  pre-buffered fake watcher events
@Ankitsinghsisodya
Copy link
Copy Markdown
Contributor Author

@creydr can you please run /retest? I think these these test cases are flaky.

@creydr
Copy link
Copy Markdown
Member

creydr commented Apr 22, 2026

/ok-to-test

@knative-prow knative-prow Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 22, 2026
@Ankitsinghsisodya
Copy link
Copy Markdown
Contributor Author

@creydr can you we again /restest the failed test cases.

@creydr
Copy link
Copy Markdown
Member

creydr commented Apr 23, 2026

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flag to disable caching on ApiServerSource adapter

3 participants