enhance: decouple parquet metadata loading from GroupChunkTranslator#49026
enhance: decouple parquet metadata loading from GroupChunkTranslator#49026sre-ci-robot merged 2 commits intomilvus-io:masterfrom
Conversation
Previously GroupChunkTranslator loaded and retained full parquet::FileMetaData in its constructor even when the parquet stats skip index was disabled (the default). This wasted memory across long-lived translators and coupled the translator to parquet-level types. Extract metadata loading into a free LoadGroupChunkMetadata declared in ChunkedSegmentSealedImpl.h: callers pass field_ids_for_stats only when ENABLE_PARQUET_STATS_SKIP_INDEX is on, otherwise the loader skips GetParquetMetadata() entirely and returns only the lightweight RowGroupMetadataVector. The translator now takes row_group_meta_list directly and no longer holds parquet FileMetaData. Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #49026 +/- ##
==========================================
- Coverage 77.97% 77.95% -0.03%
==========================================
Files 2167 2167
Lines 356240 356274 +34
==========================================
- Hits 277783 277734 -49
- Misses 69915 69984 +69
- Partials 8542 8556 +14
🚀 New features to boost your workflow:
|
|
/ci-rerun-e2e-default |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sparknack, zhengbuqian The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ci-rerun-ut-go |
|
/ci-rerun-e2e-default |
1 similar comment
|
/ci-rerun-e2e-default |
|
/lgtm |
…lator (#49027) issue: #49025 pr: #49026 ## Summary Cherry-pick of #49026 to the 3.0 branch. - Extract parquet metadata loading out of `GroupChunkTranslator`'s constructor into a free `LoadGroupChunkMetadata` function declared in `segcore/ChunkedSegmentSealedImpl.h`. Returns `{ row_group_meta_list, parquet_stats_by_field }`. - When `ENABLE_PARQUET_STATS_SKIP_INDEX` is off (default), the loader only reads the lightweight `RowGroupMetadataVector` (24 B per row group) and skips `GetParquetMetadata()` entirely — no `parquet::FileMetaData` is loaded or retained. - When on, stats are extracted during the same parallel file pass and the `FileMetaData` is released before the translator is constructed. - `GroupChunkTranslator` now accepts `std::vector<milvus_storage::RowGroupMetadataVector>&&` directly; the `parquet_file_metas()` / `field_id_mapping()` getters are removed. - Updated both call sites (`ChunkedSegmentSealedImpl::load_column_group_data_internal`, `JsonKeyStats`) and the translator unit test. ## Test plan - [x] Local build passes (`milvus_segcore`, `milvus_index`) on 3.0 - [x] Cherry-pick applied with auto-merge, no manual conflict resolution needed - [ ] Full CI — C++ unit tests + integration tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
issue: #49025
Summary
GroupChunkTranslator's constructor into a freeLoadGroupChunkMetadatafunction declared insegcore/ChunkedSegmentSealedImpl.h. Returns{ row_group_meta_list, parquet_stats_by_field }.ENABLE_PARQUET_STATS_SKIP_INDEXis off (default), the loader only reads the lightweightRowGroupMetadataVector(24 B per row group) and skipsGetParquetMetadata()entirely — noparquet::FileMetaDatais loaded or retained.FileMetaDatais released before the translator is constructed.GroupChunkTranslatornow acceptsstd::vector<milvus_storage::RowGroupMetadataVector>&&directly; theparquet_file_metas()/field_id_mapping()getters are removed.ChunkedSegmentSealedImpl::load_column_group_data_internal,JsonKeyStats) and the translator unit test.Test plan
milvus_segcore,milvus_index)GroupChunkTranslatorTestupdated and compiles🤖 Generated with Claude Code