Skip to content

[python] Push down IndexedSplit row ranges to Lance reader#7666

Open
chenghuichen wants to merge 3 commits intoapache:masterfrom
chenghuichen:lance-fix
Open

[python] Push down IndexedSplit row ranges to Lance reader#7666
chenghuichen wants to merge 3 commits intoapache:masterfrom
chenghuichen:lance-fix

Conversation

@chenghuichen
Copy link
Copy Markdown
Contributor

@chenghuichen chenghuichen commented Apr 17, 2026

Purpose

After a global index search returns a sparse set of matching row IDs, the data fetch path previously used RowIdFilterRecordBatchReader for Lance — reading the entire Lance file and discarding non-matching rows in memory.

This change converts the global row ID ranges to local file offsets and pushes them down to lance.file.LanceFileReader.take_rows(), so only the matched rows are physically read from disk, leveraging Lance's native random-access capability.

Tests

  • paimon-python/pypaimon/tests/data_evolution_test.py

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

+1

@chenghuichen chenghuichen changed the title [python] Push down SlicedSplit row range to Lance native [WIP][python] Push down SlicedSplit row range to Lance native Apr 19, 2026
@chenghuichen chenghuichen changed the title [WIP][python] Push down SlicedSplit row range to Lance native [python] Push down IndexedSplit row range to Lance native Apr 19, 2026
@chenghuichen chenghuichen changed the title [python] Push down IndexedSplit row range to Lance native [python] Push down IndexedSplit row ranges to Lance native Apr 19, 2026
@chenghuichen chenghuichen changed the title [python] Push down IndexedSplit row ranges to Lance native [python] Push down IndexedSplit row ranges to Lance reader Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants