Skip to content

[python] Push down shard range to Vortex and Lance format readers#7673

Open
chenghuichen wants to merge 1 commit intoapache:masterfrom
chenghuichen:vortex-fix
Open

[python] Push down shard range to Vortex and Lance format readers#7673
chenghuichen wants to merge 1 commit intoapache:masterfrom
chenghuichen:vortex-fix

Conversation

@chenghuichen
Copy link
Copy Markdown
Contributor

@chenghuichen chenghuichen commented Apr 19, 2026

Purpose

Previously, SlicedSplit row ranges were applied post-read via ShardBatchReader — the full file was read before discarding out-of-range rows.

This change pushes the shard range directly into format readers that support native range reads:

  • Lance: uses LanceFileReader.read_range(start, num_rows) so only the requested rows are read from disk
  • Vortex: converts the shard range to indices for VortexFile.scan(indices=...), leveraging native row selection

Other formats (Parquet/ORC/Avro/Blob) fall back to ShardBatchReader as before.

Tests

  • paimon-python/pypaimon/tests/reader_append_only_test.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant