ORC-2160: [C++] Expose prefetch range planning via Reader::preBufferRange and refactor preBuffer to reuse it#2616
Open
lucasfang wants to merge 1 commit intoapache:mainfrom
Open
ORC-2160: [C++] Expose prefetch range planning via Reader::preBufferRange and refactor preBuffer to reuse it#2616lucasfang wants to merge 1 commit intoapache:mainfrom
lucasfang wants to merge 1 commit intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds a new Reader::preBufferRange() API to expose prefetch range planning (offset/length pairs), implements it in ReaderImpl, and refactors preBuffer() to reuse the shared range-planning logic before caching.
Changes:
- Introduce public
Reader::preBufferRange(stripes, includeTypes)returning offset/length ranges. - Implement
ReaderImpl::preBufferRange()by reusing existing stripe footer parsing + range extraction. - Refactor
preBuffer()to callpreBufferRange()and then populate the cache.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| c++/src/Reader.hh | Adds the ReaderImpl::preBufferRange() override declaration. |
| c++/src/Reader.cc | Implements preBufferRange() and refactors preBuffer() to reuse it + cache computed ranges. |
| c++/include/orc/Reader.hh | Adds the new public API and documentation; includes <utility> for std::pair. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+700
to
+708
| /** | ||
| * Calculate prefetch ranges by selected stripes and columns. | ||
| * It is thread safe and does not cache data. | ||
| * @param stripes the stripes to prefetch | ||
| * @param includeTypes the types to prefetch | ||
| * @return prefetch ranges as offset/length pairs | ||
| */ | ||
| virtual std::vector<std::pair<uint64_t, uint64_t>> preBufferRange( | ||
| const std::vector<uint32_t>& stripes, const std::list<uint64_t>& includeTypes) = 0; |
| ranges.emplace_back(range.offset, range.length); | ||
| } | ||
| } | ||
| return ranges; |
Comment on lines
+1811
to
+1821
| auto ranges = preBufferRange(stripes, includeTypes); | ||
| if (ranges.empty()) { | ||
| return; | ||
| } | ||
|
|
||
| std::vector<ReadRange> readRanges; | ||
| readRanges.reserve(ranges.size()); | ||
| for (const auto& range : ranges) { | ||
| readRanges.emplace_back(range.first, range.second); | ||
| } | ||
| contents_->cacheRanges(std::move(readRanges)); |
…efactor preBuffer to reuse it
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR introduces a new Reader API, preBufferRange(stripes, includeTypes), which computes and returns the exact prefetch ranges (offset/length pairs), implements it in ReaderImpl, and refactors the existing preBuffer path to reuse that range-planning logic before populating cache.
Why are the changes needed?
The change is needed because preBuffer previously did not expose the planned ranges, but external users now have a multi-Reader scenario where different Readers prefetch different ranges of the same file and share an external prebuffer cache, so they must know the precise ranges in advance to coordinate prefetching and avoid duplicated work.
Was this patch authored or co-authored using generative AI tooling?
No