Skip to content

GH-46994: [C++][Parquet] Reuse BinaryView headers for repeated values in dictionary and DELTA_BYTE_ARRAY decoding#49850

Open
abhishek593 wants to merge 1 commit intoapache:mainfrom
abhishek593:decoding
Open

GH-46994: [C++][Parquet] Reuse BinaryView headers for repeated values in dictionary and DELTA_BYTE_ARRAY decoding#49850
abhishek593 wants to merge 1 commit intoapache:mainfrom
abhishek593:decoding

Conversation

@abhishek593
Copy link
Copy Markdown
Contributor

@abhishek593 abhishek593 commented Apr 23, 2026

Rationale for this change

When decoding Parquet ByteArray columns into Arrow BinaryView arrays, repeated dictionary values previously caused redundant data copies.

What changes are included in this PR?

Dictionary decoding: Pre-build a cache of BinaryViewType::c_type headers from the dictionary entries. Out-of-line dictionary data is registered once as a shared heap buffer via a new BinaryViewBuilder::AppendBuffer API. Decoding then emits the cached header directly.

DELTA_BYTE_ARRAY decoding: When a decoded value has the same pointer and length as the previous value (the delta encoder's representation of an exact repeat), reuse the last BinaryView header instead of appending a duplicate.

Are these changes tested?

Yes. Added new tests.

Are there any user-facing changes?

No

…values in dictionary and DELTA_BYTE_ARRAY decoding

When decoding Parquet ByteArray columns into Arrow BinaryView arrays, repeated dictionary values previously caused redundant data copies. This change pre-builds a cache of BinaryView headers per dictionary entry and registers the dictionary data as a single shared heap buffer, so all occurrences of the same dictionary value reuse the same view without copying.
@abhishek593 abhishek593 requested a review from wgtmac as a code owner April 23, 2026 20:26
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #46994 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant