Skip to content

ORC-2160: [C++] Expose prefetch range planning via Reader::preBufferRange and refactor preBuffer to reuse it#2616

Open
lucasfang wants to merge 1 commit intoapache:mainfrom
lucasfang:prebuffer
Open

ORC-2160: [C++] Expose prefetch range planning via Reader::preBufferRange and refactor preBuffer to reuse it#2616
lucasfang wants to merge 1 commit intoapache:mainfrom
lucasfang:prebuffer

Conversation

@lucasfang
Copy link
Copy Markdown

What changes were proposed in this pull request?

This PR introduces a new Reader API, preBufferRange(stripes, includeTypes), which computes and returns the exact prefetch ranges (offset/length pairs), implements it in ReaderImpl, and refactors the existing preBuffer path to reuse that range-planning logic before populating cache.

Why are the changes needed?

The change is needed because preBuffer previously did not expose the planned ranges, but external users now have a multi-Reader scenario where different Readers prefetch different ranges of the same file and share an external prebuffer cache, so they must know the precise ranges in advance to coordinate prefetching and avoid duplicated work.

Was this patch authored or co-authored using generative AI tooling?

No

Copilot AI review requested due to automatic review settings May 7, 2026 03:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a new Reader::preBufferRange() API to expose prefetch range planning (offset/length pairs), implements it in ReaderImpl, and refactors preBuffer() to reuse the shared range-planning logic before caching.

Changes:

  • Introduce public Reader::preBufferRange(stripes, includeTypes) returning offset/length ranges.
  • Implement ReaderImpl::preBufferRange() by reusing existing stripe footer parsing + range extraction.
  • Refactor preBuffer() to call preBufferRange() and then populate the cache.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
c++/src/Reader.hh Adds the ReaderImpl::preBufferRange() override declaration.
c++/src/Reader.cc Implements preBufferRange() and refactors preBuffer() to reuse it + cache computed ranges.
c++/include/orc/Reader.hh Adds the new public API and documentation; includes <utility> for std::pair.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread c++/include/orc/Reader.hh
Comment on lines +700 to +708
/**
* Calculate prefetch ranges by selected stripes and columns.
* It is thread safe and does not cache data.
* @param stripes the stripes to prefetch
* @param includeTypes the types to prefetch
* @return prefetch ranges as offset/length pairs
*/
virtual std::vector<std::pair<uint64_t, uint64_t>> preBufferRange(
const std::vector<uint32_t>& stripes, const std::list<uint64_t>& includeTypes) = 0;
Comment thread c++/src/Reader.cc
ranges.emplace_back(range.offset, range.length);
}
}
return ranges;
Comment thread c++/src/Reader.cc
Comment on lines +1811 to +1821
auto ranges = preBufferRange(stripes, includeTypes);
if (ranges.empty()) {
return;
}

std::vector<ReadRange> readRanges;
readRanges.reserve(ranges.size());
for (const auto& range : ranges) {
readRanges.emplace_back(range.first, range.second);
}
contents_->cacheRanges(std::move(readRanges));
@lucasfang lucasfang changed the title [C++] Expose prefetch range planning via Reader::preBufferRange and refactor preBuffer to reuse it ORC-2160: [C++] Expose prefetch range planning via Reader::preBufferRange and refactor preBuffer to reuse it May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants