enhance: replace stale columns when load-diff groups keep shape but swap files#49066
enhance: replace stale columns when load-diff groups keep shape but swap files#49066congqixia wants to merge 1 commit intomilvus-io:masterfrom
Conversation
…wap files Related to milvus-io#46358 ComputeDiffBinlogs and ComputeDiffColumnGroups in SegmentLoadInfo only compared field-level membership, so a field that stayed under the same group id / column-group index was treated as unchanged even when the underlying files were rewritten (e.g. post-compaction manifests that preserve the column-group layout but point at new parquet files). ApplyLoadDiff gates reader rebuild and column eviction on *_to_replace entries, so the loader silently kept serving stale cached chunks. Compare the ordered file-path sequence per group and route pre-existing fields into binlogs_to_replace / column_groups_to_replace (respecting the existing eager-vs-lazy policy) when files differ; genuinely new children in a files-changed group still go to binlogs_to_load so the replace path isn't asked to evict a column that was never loaded. Add SetColumnGroupsForTesting as a narrow hook so unit tests can exercise ComputeDiffColumnGroups without constructing real manifest files, and cover both diff functions for: identical files, different file paths, different file counts, new child under changed files, and non-double-emission for moved fields whose old group's files also changed. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: congqixia The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #49066 +/- ##
==========================================
+ Coverage 77.99% 78.01% +0.01%
==========================================
Files 2168 2168
Lines 356784 356952 +168
==========================================
+ Hits 278279 278466 +187
+ Misses 69941 69918 -23
- Partials 8564 8568 +4
🚀 New features to boost your workflow:
|
| if (!cur_cg || !new_cg) { | ||
| continue; | ||
| } | ||
| const auto& cur_files = cur_cg->files; |
There was a problem hiding this comment.
In fact, this all assumes that columns are consistent within column groups. Is there a part logical to verify this?
Related to #46358
ComputeDiffBinlogs and ComputeDiffColumnGroups in SegmentLoadInfo only compared field-level membership, so a field that stayed under the same group id / column-group index was treated as unchanged even when the underlying files were rewritten (e.g. post-compaction manifests that preserve the column-group layout but point at new parquet files). ApplyLoadDiff gates reader rebuild and column eviction on *_to_replace entries, so the loader silently kept serving stale cached chunks.
Compare the ordered file-path sequence per group and route pre-existing fields into binlogs_to_replace / column_groups_to_replace (respecting the existing eager-vs-lazy policy) when files differ; genuinely new children in a files-changed group still go to binlogs_to_load so the replace path isn't asked to evict a column that was never loaded.
Add SetColumnGroupsForTesting as a narrow hook so unit tests can exercise ComputeDiffColumnGroups without constructing real manifest files, and cover both diff functions for: identical files, different file paths, different file counts, new child under changed files, and non-double-emission for moved fields whose old group's files also changed.