fix: osw to parquet export with global peptide/protein scores by singjc · Pull Request #206 · PyProphet/pyprophet

singjc · 2026-04-26T23:53:50Z

This pull request improves the handling of score tables and joins in the pyprophet/io/export/osw.py module, specifically addressing how global and non-global contexts are managed in SQL queries and how joins are constructed when RUN_ID may be NULL. The main focus is to ensure correct merging and selection of scores for both global and run-specific contexts, especially in cases where some data may lack a RUN_ID.

Score table querying and merging improvements:

Refactored the construction of pivot columns and queries in _get_peptide_protein_score_table to separately track non-global and global context columns, and to handle cases where either or both types of context exist. This includes building the merged query with appropriate FULL OUTER JOIN logic and ensuring correct column selection and grouping. [1] [2]

Join logic enhancements for handling NULL RUN_ID:

Updated the join conditions in _build_score_column_selection_and_joins to allow joining score views where RUN_ID is either matching or NULL, improving robustness when global scores (without a RUN_ID) are present. This applies to both peptide and protein score joins. [1] [2]

- Introduced a new test file `test_osw_export_score_views.py` to validate the export of score views from OSW files. - Implemented helper functions to create test OSW databases and read joined scores using DuckDB. - Added tests to ensure global and experiment-wide scores are correctly handled when run IDs are null. - Enhanced `test_pyprophet_export.py` by adding a sorting function for exported parquet frames to ensure deterministic snapshots. - Updated existing tests to utilize the new sorting function for parquet exports.

Copilot

Pull request overview

This PR fixes OSW→Parquet export behavior for peptide/protein score tables by improving how global vs run-scoped contexts are queried/merged and by making joins tolerant to RUN_ID being NULL, ensuring global scores are retained during export.

Changes:

Refactors _get_peptide_protein_score_table to separately build/merge non-global (keyed by (ID, RUN_ID)) and global (keyed by ID) score projections.
Updates score-view join conditions to allow matching on (FEATURE.RUN_ID = view.RUN_ID OR view.RUN_ID IS NULL) so global-score rows without RUN_ID still join.
Stabilizes parquet export regression snapshots by sorting exported parquet frames prior to printing.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`pyprophet/io/export/osw.py`	Refactors peptide/protein score view generation and adjusts join logic to handle global scores where `RUN_ID` may be `NULL`.
`tests/test_osw_export_score_views.py`	Adds focused tests validating that global peptide/protein scores are preserved when `RUN_ID` is `NULL`.
`tests/test_pyprophet_export.py`	Adds deterministic sorting before regtest snapshot output for parquet export tests.
`tests/_regtest_outputs/test_pyprophet_export.test_parquet_export_scored_osw.out`	Updates expected regtest snapshot after introducing deterministic sorting.
`tests/_regtest_outputs/test_pyprophet_export.test_parquet_export_no_transition_data.out`	Updates expected regtest snapshot after introducing deterministic sorting.

Comments suppressed due to low confidence (1)

pyprophet/io/export/osw.py:2579

Using DuckDB ANY_VALUE() to collapse potentially multiple rows per (context, ID, RUN_ID) can yield nondeterministic results if duplicates exist (it may pick any row). Prefer a deterministic aggregate (e.g., MIN/MAX) or enforce uniqueness (e.g., assert/count duplicates) so exports don’t silently vary across runs/files.

                    [
                        f"ANY_VALUE(CASE WHEN context = '{context}' THEN SCORE END) as {score_table}_{safe_context}_SCORE",
                        f"ANY_VALUE(CASE WHEN context = '{context}' THEN PVALUE END) as {score_table}_{safe_context}_PVALUE",
                        f"ANY_VALUE(CASE WHEN context = '{context}' THEN QVALUE END) as {score_table}_{safe_context}_QVALUE",
                        f"ANY_VALUE(CASE WHEN context = '{context}' THEN PEP END) as {score_table}_{safe_context}_PEP",
                    ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ted before validation

- Introduced _stabilize_regtest_float function to ensure deterministic float rendering across platforms. - Updated _normalize_regtest_frame to utilize the new stabilization function for better consistency in test outputs. - Adjusted _normalize_peakgroup_regtest_frame to call the generalized normalization function. - Improved handling of tiny floating-point values and ensured zero values are consistently represented.

singjc and others added 2 commits April 26, 2026 19:52

Merge branch 'PyProphet:master' into master

8d36acb

Copilot AI review requested due to automatic review settings April 26, 2026 23:53

Copilot started reviewing on behalf of singjc April 26, 2026 23:54 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Comment thread pyprophet/io/export/osw.py

singjc added 3 commits April 26, 2026 21:15

Refactor test for parquet export to ensure precursor DataFrame is sor…

185a55a

…ted before validation

Merge branch 'master' of github.com:singjc/pyprophet

97f3398

singjc enabled auto-merge April 27, 2026 04:38

singjc merged commit 5d4406d into PyProphet:master Apr 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: osw to parquet export with global peptide/protein scores#206

fix: osw to parquet export with global peptide/protein scores#206
singjc merged 5 commits intoPyProphet:masterfrom
singjc:master

singjc commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singjc commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants