Skip to content

Refactor xenium reader#386

Merged
LucaMarconato merged 7 commits intomainfrom
refactor/xenium
Apr 30, 2026
Merged

Refactor xenium reader#386
LucaMarconato merged 7 commits intomainfrom
refactor/xenium

Conversation

@LucaMarconato
Copy link
Copy Markdown
Member

Code refactoring of the xenium.py to improve readability and maintainability.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 81.25000% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.15%. Comparing base (433014e) to head (e79c453).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_io/readers/xenium.py 81.25% 30 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #386      +/-   ##
==========================================
+ Coverage   61.65%   62.15%   +0.49%     
==========================================
  Files          27       27              
  Lines        3161     3181      +20     
==========================================
+ Hits         1949     1977      +28     
+ Misses       1212     1204       -8     
Files with missing lines Coverage Δ
src/spatialdata_io/readers/xenium.py 74.55% <81.25%> (+3.50%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LucaMarconato and others added 3 commits April 29, 2026 14:01
Rename _CellsZarr → _XeniumCells to reflect that it aggregates cell
identity from multiple on-disk sources (zarr + parquet), not just zarr.

Key changes:
- nucleus_indices_mapping and cell_indices_mapping are now computed once
  in __post_init__ and stored on the instance; no caller recomputes them.
- get_cell_metadata() added: reads cells.parquet, decodes cell_id to str,
  and cross-checks it against the zarr cell_id array. No other code
  reads the parquet cell_id independently.
- get_cell_summary() returns None for v < 1.3.0 instead of asserting.
- seg_mask_value branch in get_indices_mapping now validates the expected
  contiguous 1..N range and warns if violated.
- _enrich_table() removed: cell_summary enrichment moved into
  _get_tables_and_circles(); cell_labels instance-key mapping moved into
  xenium() where it belongs conceptually.
- _get_tables_and_circles() simplified (drops cells_labels and
  cells_as_circles parameters).
- Docstrings updated to accurately document raster label semantics
  (label = p+1, not cell_index[p]+1), version differences, and
  the reasoning behind each branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_tables_and_circles always builds the circles object (needed for
obsm["spatial"] and radii), but they should only appear in the output
SpatialData when the user requests cells_as_circles=True.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@LucaMarconato LucaMarconato marked this pull request as ready for review April 29, 2026 15:26
@LucaMarconato LucaMarconato requested a review from Copilot April 29, 2026 15:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the Xenium reader (xenium.py) to centralize Xenium cell-ID/mapping logic and simplify the main xenium() control flow.

Changes:

  • Introduces _XeniumCells dataclass to encapsulate cells.zarr.zip access, label-index mappings, and cell-summary/metadata helpers.
  • Refactors xenium() to use the shared _XeniumCells context, streamlining table enrichment and boundary/label reading.
  • Extracts morphology-focus handling into _get_morphology_focus() to separate version-specific image logic.
Comments suppressed due to low confidence (1)

src/spatialdata_io/readers/xenium.py:500

  • The docstring says that when indices_mapping is None _get_polygons() "falls back to cell_id-based grouping". However, when the parquet contains label_id (v2.0+), the grouping key becomes label_id (see columns_to_read.append("label_id" if has_label_id else cell_id)), so the fallback index will be derived from label IDs, not cell IDs. Please clarify the docstring (and/or force indices_mapping to be provided when label_id is present, since otherwise the resulting shapes cannot be keyed by cell_id).
    indices_mapping
        When provided (from ``_XeniumCells``),
        contains ``cell_id`` and ``label_index`` columns. The parquet ``label_id`` column is used
        for fast integer-based change detection (to locate all the vertices of each polygon).
        When None, falls back to cell_id-based grouping from the parquet (Xenium < 2.0).
    is_nucleus

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/spatialdata_io/readers/xenium.py
Comment thread src/spatialdata_io/readers/xenium.py
Comment thread src/spatialdata_io/readers/xenium.py Outdated
Comment thread src/spatialdata_io/readers/xenium.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@LucaMarconato
Copy link
Copy Markdown
Member Author

Claude Code (local) and Copilot review passed. Ready to merge.

_decode_cell_id_column was unconditionally calling astype(str), converting
plain-integer cell_ids (v1.0.x datasets) to strings. This broke merges
against externally-generated CSVs that still hold int64 cell_ids. Restore
the original behaviour: only decode bytes->str, leave other types unchanged.
Also fix the cross-check assertion in _get_tables_and_circles to compare
cell_ids as strings when the parquet column is numeric, matching what the
old code did explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@LucaMarconato LucaMarconato merged commit 79e5f89 into main Apr 30, 2026
5 checks passed
@LucaMarconato LucaMarconato deleted the refactor/xenium branch April 30, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants