fix: matrix exports with 0 data#207
Merged
singjc merged 5 commits intoPyProphet:masterfrom Apr 27, 2026
Merged
Conversation
Co-authored-by: Copilot <copilot@github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to make quantification matrix export/summarization fail fast with actionable errors when filtering or aggregation produces empty datasets, improving user feedback instead of producing silent/unclear failures.
Changes:
- Added empty-dataset validation in
export_quant_matrixand multiple summarization levels (precursor/peptide/protein/gene) with descriptiveValueErrors. - Refactored peptide-level matrix creation to add guarded unstacking/error messaging for reshape issues.
- Added additional validation checkpoints after mapping/summarization steps to catch empty intermediate results earlier.
Comments suppressed due to low confidence (1)
pyprophet/io/_base.py:779
export_quant_matrixis annotated to returnpd.DataFrame, but it never returns the created matrix (it only writes it to disk). This makes the type hint misleading and can break callers that expect a return value. Either returnmatrixat the end or change the return annotation/docstring to-> None(and adjust any usages accordingly).
def export_quant_matrix(self, data: pd.DataFrame) -> pd.DataFrame:
"""
Export quantification matrix at specified level with optional normalization.
Args:
data: Input DataFrame with quantification data
"""
cfg = self.config
# Check if data is empty
if data.empty:
raise ValueError(
"No identification results passed the filtering criteria. "
"The filtered dataset is empty. Please check your filter settings: "
f"max_rs_peakgroup_qvalue={cfg.max_rs_peakgroup_qvalue}, "
f"max_global_peptide_qvalue={cfg.max_global_peptide_qvalue}, "
f"max_global_protein_qvalue={cfg.max_global_protein_qvalue}"
)
sep = "," if cfg.out_type == "csv" else "\t"
level = self.level
normalization = self.config.normalization
# Validate input
if level not in ["precursor", "peptide", "protein", "gene"]:
raise ValueError(
"Invalid level. Choose from: precursor, peptide, protein, gene"
)
if normalization not in self.normalization_methods:
raise ValueError(
f"Invalid normalization. Choose from: {list(self.normalization_methods.keys())}"
)
# Get the appropriate summarization method
summarizer = getattr(self, f"_summarize_{level}_level")
matrix = summarizer(data, self.config.top_n, self.config.consistent_top)
# Apply normalization
if normalization != "none":
# Set non-numeric columns as index
non_numeric_cols = list(matrix.select_dtypes(exclude="number").columns)
if len(non_numeric_cols) > 0:
matrix = matrix.set_index(non_numeric_cols)
matrix = self.normalization_methods[normalization](matrix)
matrix = matrix.reset_index()
matrix.to_csv(self.config.outfile, sep=sep, index=False)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…cation matrix creation Co-authored-by: Copilot <copilot@github.com>
…g fill_value to None Co-authored-by: Copilot <copilot@github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds extensive error handling and data validation to the quantification matrix export and summarization pipeline in
pyprophet/io/_base.py. The main goal is to provide clear, actionable error messages when filtering or aggregation steps result in empty datasets, preventing silent failures and making debugging easier for users.The most important changes are:
Error handling for empty datasets:
export_quant_matrix,_summarize_precursor_level,_summarize_peptide_level,_summarize_protein_level, and_summarize_gene_levelto raise descriptiveValueErrors if the input or intermediate dataframes are empty after filtering or grouping steps. These messages guide users to adjust filter settings or check their data annotations. [1] [2] [3] [4] [5] [6] [7] [8]Robustness improvements in peptide-level summarization:
These changes make the quantification pipeline more robust and user-friendly, ensuring that issues are caught early with clear explanations.