Skip to content

GH-1125: Read empty ListVector correctly using UnionListReader after IPC deser#1136

Open
bodduv wants to merge 5 commits intoapache:mainfrom
bodduv:arrow-java/GH-1125-empty-list-reader-position
Open

GH-1125: Read empty ListVector correctly using UnionListReader after IPC deser#1136
bodduv wants to merge 5 commits intoapache:mainfrom
bodduv:arrow-java/GH-1125-empty-list-reader-position

Conversation

@bodduv
Copy link
Copy Markdown
Contributor

@bodduv bodduv commented May 6, 2026

Arrow IPC represents variable-length vectors with an offset buffer containing valueCount + 1 offsets. For an empty ListVector, that still means the serialized and deserialized vector can have a non-empty offset buffer containing the leading zero offset. This is correct according to the Arrow layout, but it exposes a bug at UnionListReader.setPosition and other similar places. UnionListReader.setPosition(0) used offset-buffer capacity as the empty-vector check. That worked only when the offset buffer had zero capacity. After IPC, the empty vector has non-zero offset-buffer capacity, so the reader could throw IndexOutOfBoundsException. UnionLargeListReader has the same logical issue and also lacked the empty-buffer guard.

What's Changed

Update UnionListReader and UnionLargeListReader to validate reader positioning against valueCount instead of treating offset-buffer capacity as the logical row boundary. All out-of-range positions will throw. For valid non-empty positions, the readers also defensively verify that the offset buffer has enough capacity for both index and index + 1 before reading offsets.

The shared bounds logic is kept in a package-private UnionListReaderBoundsChecker helper so UnionListReader and UnionLargeListReader use the reuse code.

Closes #1125.

@github-actions

This comment has been minimized.

@bodduv
Copy link
Copy Markdown
Contributor Author

bodduv commented May 6, 2026

I don't think this is a breaking-change, it should be labeled bug-fix

Comment thread vector/src/main/java/org/apache/arrow/vector/complex/impl/UnionListReader.java Outdated
Comment thread vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java
@jbonofre jbonofre added bug-fix PRs that fix a big. and removed breaking-change labels May 7, 2026
@jbonofre jbonofre added this to the 20.0.0 milestone May 7, 2026
Copy link
Copy Markdown
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess UnionListViewReader needs the same fix, right?

@bodduv bodduv requested review from jbonofre and jhrotko May 7, 2026 13:19
Copy link
Copy Markdown
Contributor

@jhrotko jhrotko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PRs that fix a big.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnionListReader.setPosition throws IOOBE on a post-IPC empty List

3 participants