Specialize filter for list-like arrays (List/LargeList/FixedSizeList/Map, …)#10236
Specialize filter for list-like arrays (List/LargeList/FixedSizeList/Map, …)#10236Jeadie wants to merge 2 commits into
filter for list-like arrays (List/LargeList/FixedSizeList/Map, …)#10236Conversation
…t/Map/…) `FilterPredicate::filter` previously fell back to the generic `MutableArrayData` path for `List`/`LargeList`/`FixedSizeList`/`Map`. This adds specialized kernels that map each retained run of parent rows to a contiguous range of child elements and reuse the already-vectorized per-type child filter kernels, instead of the generic byte-copy fallback. Child handling is selectivity-aware (work is proportional to retained runs and elements, not the full child length) and streams ranges without an intermediate `Vec`: byte children go straight to `FilterBytes`, nested lists recurse, and others use a `Slices` predicate. A child-type allowlist keeps types that can't beat the fallback (dense `Union`, `RunEndEncoded`) on `MutableArrayData`, and a cheap selectivity guard routes dense `Map` filters to the fallback too. Adds benchmarks for the affected types in `arrow/benches/filter_kernels.rs`.
|
run benchmark filter_kernel |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing jeadie/filter-list-specialization (da21a5a) to 7616e10 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
run benchmark filter_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing jeadie/filter-list-specialization (da21a5a) to 7616e10 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
This PR improves the performance of
FilterPredicate::filterfor array based data types, specifically:List<T>,FixedSizeList<T>,Map<T>.This optimisation is based on one idea: translate retained (i.e. filter = true) parent-row runs into child element ranges (trivially contiguous due to how list/fixed/map layouts work), then hand those ranges to a already-fast child kernels rather than copying element-by-element.
filteris one of the most-executed kernels in Apache DataFusion, and now these list/nested types have fast path. Several common Datafusion uses are especially impacted:FixedSizeListas embeddings or vectorsarray_agg,Unnest)GROUP BYoperationsWhat changes are included in this PR?
FilterPredicate::filterforDataType::FixedSizeList,DataType::Map,DataType::ListandDataType::LargeList(the latter two are only specialised for certain/most child types).arrow/benches/filter_kernels.rs.Changes Explained
Before
Prior to this PR,
List<T>, used theMutableArrayDatafallback.Example
MutableArrayData walks the full child buffer, copying by range for each retained row.
After
Non-specialised Child types.
List<T>is not specialised for someTchild types (and similarly other array types mentioned). This PR specialises if the child typeThas a fast, vectorized kernel for it that is driven only by the predicate'sSlices(never readsfilterdirectly). Everything else uses the well-tuned, correctMutableArrayDatafallback.UnionRunEndEncodedfilter_run_end_array) readspredicate.filterdirectly. The specialization streams ranges via aSlicespredicate whosefilteris intentionally empty.Every other list child is specialized: primitives, boolean, null,
Utf8/LargeUtf8/Binary/LargeBinary,Utf8View/BinaryView,FixedSizeBinary,FixedSizeList,Dictionary,Struct, sparseUnion,ListView/LargeListView, and nestedList/LargeList.Are these changes tested?
Tests in
arrow-select/src/filter.rs.Benchmark results
size = 65536before → after (speedup), whereMutableArrayDatafallbackList<T>by child typeList<List>Direct kernels (new)
filter FixedSizeListfilter MapValue-length sweep —
List<Utf8>@ kept ½Regressions / caveats
All sub-1.0 results occur only at the dense
kept 1023/1024end (rare for selective predicates), plus the nested-list½tie:filter FixedSizeList0.96× (memcpy-bound — the fallback is already tight there).List<List>@ ½: 0.99× (offset-dominated; ties the fallback, then wins 1.09×/2.27× at the other selectivities).No remaining regression exceeds ~4%. Every
kept 1/1024(highly selective) case is a 2.2–4.8× win.Are there any user-facing changes?
N/A.