[SPARK-57704][PYTHON][TESTS] Add ASV microbenchmark for SQL_TRANSFORM_WITH_STATE_PANDAS_INIT_STATE_UDF by Yicong-Huang · Pull Request #56794 · apache/spark

Yicong-Huang · 2026-06-25T23:36:15Z

What changes were proposed in this pull request?

Add ASV microbenchmarks for the SQL_TRANSFORM_WITH_STATE_PANDAS_INIT_STATE_UDF eval type in python/benchmarks/bench_eval_type.py, with both time_* and peakmem_* variants over the same scenario grid as the plain SQL_TRANSFORM_WITH_STATE_PANDAS_UDF benchmark plus a small seeded initial-state dataset per group. The benchmark reconstructs the worker wire protocol for transformWithStateInPandas with initial state: a single Arrow stream whose top-level schema is struct<inputData, initState> (matching TransformWithStateInPySparkPythonInitialStateRunner), emitting all initial-state batches first then all data batches (the JVM initData ++ data ordering), with the inactive side of each batch written as an all-null struct so TransformWithStateInPandasInitStateSerializer never sees a mixed batch and regroups rows by the leading key.

Why are the changes needed?

This is the last transformWithState Pandas eval type without benchmark coverage. The eval type is slated for the serializer/eval-type refactor, and a microbenchmark establishes the baseline needed to prove the refactor introduces no regression.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. Test-only addition; no behavior change.

Ran locally with COLUMNS=120 asv run --python=same --bench TransformWithStatePandasInitState -a repeat=3. Results are stable across repeated runs; one representative run below.

[time] TransformWithStatePandasInitStateUDFTimeBench.time_worker
================ ============== ============ ============
--                                 udf
---------------- ----------------------------------------
    scenario      identity_udf    sort_udf    count_udf
================ ============== ============ ============
 few_groups_sm      810±4ms       833±3ms      835±20ms
 few_groups_lg     7.48±0.1s     7.70±0.3s    7.28±0.2s
 many_groups_sm    7.93±0.3s     7.95±0.1s    8.87±0.05s
 many_groups_lg    4.04±0.05s    4.10±0.02s   4.27±0.04s
   wide_cols       8.29±0.3s     8.20±0.2s    7.60±0.04s
   mixed_cols      3.42±0.05s    3.45±0.02s   3.25±0.03s
 nested_struct     7.99±0.2s     7.91±0.02s   5.67±0.03s
================ ============== ============ ============

[peakmem] TransformWithStatePandasInitStateUDFPeakmemBench.peakmem_worker
================ ============== ========== ===========
--                                udf
---------------- -------------------------------------
    scenario      identity_udf   sort_udf   count_udf
================ ============== ========== ===========
 few_groups_sm        116M         115M        106M
 few_groups_lg        248M         248M        248M
 many_groups_sm       176M         177M        161M
 many_groups_lg       151M         151M        151M
   wide_cols          364M         367M        342M
   mixed_cols         182M         182M        182M
 nested_struct        210M         210M        210M
================ ============== ========== ===========

Was this patch authored or co-authored using generative AI tooling?

No.

…_STATE_UDF

uros-b

Thank you @Yicong-Huang!

test: add ASV microbenchmark for SQL_TRANSFORM_WITH_STATE_PANDAS_INIT…

c95aab3

…_STATE_UDF

uros-b approved these changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57704][PYTHON][TESTS] Add ASV microbenchmark for SQL_TRANSFORM_WITH_STATE_PANDAS_INIT_STATE_UDF#56794

[SPARK-57704][PYTHON][TESTS] Add ASV microbenchmark for SQL_TRANSFORM_WITH_STATE_PANDAS_INIT_STATE_UDF#56794
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-57704

Yicong-Huang commented Jun 25, 2026 •

edited

Loading

Uh oh!

uros-b left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Yicong-Huang commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yicong-Huang commented Jun 25, 2026 •

edited

Loading