Skip to content

fix: matrix exports with 0 data#207

Merged
singjc merged 5 commits intoPyProphet:masterfrom
singjc:master
Apr 27, 2026
Merged

fix: matrix exports with 0 data#207
singjc merged 5 commits intoPyProphet:masterfrom
singjc:master

Conversation

@singjc
Copy link
Copy Markdown
Contributor

@singjc singjc commented Apr 27, 2026

This pull request adds extensive error handling and data validation to the quantification matrix export and summarization pipeline in pyprophet/io/_base.py. The main goal is to provide clear, actionable error messages when filtering or aggregation steps result in empty datasets, preventing silent failures and making debugging easier for users.

The most important changes are:

Error handling for empty datasets:

  • Added checks in export_quant_matrix, _summarize_precursor_level, _summarize_peptide_level, _summarize_protein_level, and _summarize_gene_level to raise descriptive ValueErrors if the input or intermediate dataframes are empty after filtering or grouping steps. These messages guide users to adjust filter settings or check their data annotations. [1] [2] [3] [4] [5] [6] [7] [8]

Robustness improvements in peptide-level summarization:

  • Improved the peptide-level summarization logic to handle index conflicts during unstacking. If duplicate column names are detected, a clear error is raised to help users identify malformed data or inconsistent annotations.

These changes make the quantification pipeline more robust and user-friendly, ensuring that issues are caught early with clear explanations.

Copilot AI review requested due to automatic review settings April 27, 2026 14:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make quantification matrix export/summarization fail fast with actionable errors when filtering or aggregation produces empty datasets, improving user feedback instead of producing silent/unclear failures.

Changes:

  • Added empty-dataset validation in export_quant_matrix and multiple summarization levels (precursor/peptide/protein/gene) with descriptive ValueErrors.
  • Refactored peptide-level matrix creation to add guarded unstacking/error messaging for reshape issues.
  • Added additional validation checkpoints after mapping/summarization steps to catch empty intermediate results earlier.
Comments suppressed due to low confidence (1)

pyprophet/io/_base.py:779

  • export_quant_matrix is annotated to return pd.DataFrame, but it never returns the created matrix (it only writes it to disk). This makes the type hint misleading and can break callers that expect a return value. Either return matrix at the end or change the return annotation/docstring to -> None (and adjust any usages accordingly).
    def export_quant_matrix(self, data: pd.DataFrame) -> pd.DataFrame:
        """
        Export quantification matrix at specified level with optional normalization.

        Args:
            data: Input DataFrame with quantification data

        """
        cfg = self.config

        # Check if data is empty
        if data.empty:
            raise ValueError(
                "No identification results passed the filtering criteria. "
                "The filtered dataset is empty. Please check your filter settings: "
                f"max_rs_peakgroup_qvalue={cfg.max_rs_peakgroup_qvalue}, "
                f"max_global_peptide_qvalue={cfg.max_global_peptide_qvalue}, "
                f"max_global_protein_qvalue={cfg.max_global_protein_qvalue}"
            )

        sep = "," if cfg.out_type == "csv" else "\t"
        level = self.level
        normalization = self.config.normalization
        # Validate input
        if level not in ["precursor", "peptide", "protein", "gene"]:
            raise ValueError(
                "Invalid level. Choose from: precursor, peptide, protein, gene"
            )

        if normalization not in self.normalization_methods:
            raise ValueError(
                f"Invalid normalization. Choose from: {list(self.normalization_methods.keys())}"
            )

        # Get the appropriate summarization method
        summarizer = getattr(self, f"_summarize_{level}_level")
        matrix = summarizer(data, self.config.top_n, self.config.consistent_top)

        # Apply normalization
        if normalization != "none":
            # Set non-numeric columns as index
            non_numeric_cols = list(matrix.select_dtypes(exclude="number").columns)

            if len(non_numeric_cols) > 0:
                matrix = matrix.set_index(non_numeric_cols)
            matrix = self.normalization_methods[normalization](matrix)
            matrix = matrix.reset_index()

        matrix.to_csv(self.config.outfile, sep=sep, index=False)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyprophet/io/_base.py Outdated
Comment thread pyprophet/io/_base.py
Comment thread pyprophet/io/_base.py
Comment thread pyprophet/io/_base.py Outdated
@singjc singjc enabled auto-merge April 27, 2026 15:01
…g fill_value to None

Co-authored-by: Copilot <copilot@github.com>
@singjc singjc merged commit d4ea1c3 into PyProphet:master Apr 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants