Skip to content

test: fix flaky bfloat16 pagination test by pinning DISKANN search_list_size#49033

Open
yanliang567 wants to merge 1 commit intomilvus-io:masterfrom
yanliang567:fix/bfloat16-pagination-diskann-flaky
Open

test: fix flaky bfloat16 pagination test by pinning DISKANN search_list_size#49033
yanliang567 wants to merge 1 commit intomilvus-io:masterfrom
yanliang567:fix/bfloat16-pagination-diskann-flaky

Conversation

@yanliang567
Copy link
Copy Markdown
Contributor

Summary

Fixes a flaky test in test_search_bfloat16_with_pagination_default that failed at ~40% rate (2/5 runs locally), with overlaps of 76–79% against the 80% threshold.

issue: #49030

Root Cause

When offset is set, Milvus proxy computes queryTopK = limit + offset and passes it directly to DISKANN as the search depth. Without an explicit search_list_size, DISKANN scales its graph traversal proportionally to queryTopK:

Call Internal topk Graph depth
Page 1 (offset=100, limit=100) 200 shallow (5× less than full)
Page 3 (offset=300, limit=100) 400 medium (2.5× less than full)
Full search (limit=1000) 1000 deep

Boundary candidates at the edge of each page differ between the paginated and full searches, producing ~76–79% overlap and flaky failures.

Fix

Pin search_list_size=1200 on both paginated and full searches. DISKANN uses this as a fixed exploration budget regardless of topk, so both calls examine the same candidate pool and agree on boundary positions.

Additional improvements:

  • Per-page overlap threshold: 80% → 90% (reflects actual overlap with matched search depth)
  • Added overall recall assertion (≥95%) to validate cross-page coverage
  • Expanded docstring explaining the DISKANN pagination consistency model

Test Plan

  • Reproduced original failure: 2/5 runs failed (40% rate), overlap 76–79%
  • Verified fix: 5/5 runs passed after pinning search_list_size
  • Tested against Milvus master-20260413 on standalone instance

🤖 Generated with Claude Code

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yanliang567

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added approved size/M Denotes a PR that changes 30-99 lines. labels Apr 15, 2026
@mergify mergify bot added dco-passed DCO check passed. kind/test labels Apr 15, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

…st_size

Without an explicit search_list_size, DISKANN adjusts its graph traversal
depth proportionally to the internal queryTopK (= limit + offset). This
caused paginated searches and the full reference search to explore different
candidate neighbourhoods:

  page 1 (offset=100, limit=100) → internal topk = 200  (shallow)
  page 3 (offset=300, limit=100) → internal topk = 400  (medium)
  full search (limit=1000)       → internal topk = 1000 (deep)

Boundary candidates at the edge of each page differ between the two calls,
producing 76-79% overlap and flaky failures against the 80% threshold (40%
failure rate in local reproduction).

Fix: pin search_list_size=1200 on both paginated and full searches so that
both explore the same DISKANN candidate pool regardless of topk. With the
same exploration depth, per-page overlap consistently exceeds 90% and
overall recall across all pages exceeds 95%.

Additional improvements:
- Per-page overlap threshold raised from 80% to 90%
- Added overall recall assertion (>=95%) to validate cross-page coverage
- Expanded docstring explaining the DISKANN pagination consistency model

issue: milvus-io#49030

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: yanliang567 <82361606+yanliang567@users.noreply.github.com>
@yanliang567 yanliang567 force-pushed the fix/bfloat16-pagination-diskann-flaky branch from 6d7bf91 to 8fbf27f Compare April 15, 2026 09:16
@mergify mergify bot added the ci-passed label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/test ci-passed dco-passed DCO check passed. kind/test sig/testing size/M Denotes a PR that changes 30-99 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants