v0.1.0: monorepo merge — unified training + inference, canonical model, archive-free inference, vendored baselines by xuefei-wang · Pull Request #41 · vanvalenlab/deepcell-types

xuefei-wang · 2026-05-30T16:48:37Z

Summary

Merges the separate training repository (deepcelltypes-cell-type-assignment-pytorch)
into this repo and replaces the legacy CellTypeCLIPModel inference path with the
current canonical model. This is the v0.1.0 release cut.

Before this PR, vanvalenlab/deepcell-types was inference-only — it shipped
CellTypeCLIPModel, the dct_kit/ helpers, and a top-level __init__ that
exported just predict. After it, a single package covers training and
inference: inference stays a plain pip install deepcell-types, the full
training pipeline lives behind a [train] extra, and the four paper comparison
baselines are vendored behind per-baseline extras.

⚠️ Breaking changes — see below.

Canonical model

model.py is rewritten around CellTypeAnnotator; CellTypeCLIPModel /
CellTypeDataEncoder are removed. Canonical training defaults (scripts/train.py,
click-based CLI): --resnet_channels 48, --domain_weight 0.1,
--best_metric macro_f1.

Mean-intensity injection — per-cell mean marker intensity is scattered
into a marker-position vector and injected as a CLS residual. The output
projection is zero-init, so warm-starting from a checkpoint preserves
predictions at step 0.
DANN domain adaptation via a gradient-reversal head, on by default
(--domain_weight 0.1; 0 disables it).
Adapter-style fine-tuning: --freeze_backbone trains only the
mean-intensity branches on top of an existing checkpoint; --unfreeze_ct_head
additionally co-adapts the CT head / CLS token / final norm without unfreezing
the transformer backbone.
Padding-channel positions are explicitly zeroed (masked_fill) through the
channel encoder, fusion, and mean-intensity paths so masked tokens contribute
exactly zero rather than leaking bias/spatial_feat into the transformer.
Self-describing checkpoints: scripts/train.py bundles ct2idx, n_heads,
and compat_marker0_zero into the checkpoint, and inference asserts the
vocabulary ordering matches (a permuted vocabulary previously passed the
count-only check and silently mislabeled cells).

Canonical-only inference

Archive-free by default: the marker / cell-type registry ships as a small
packaged vocab.json snapshot, so pip install deepcell-types +
download_model() is enough to run predict() — the multi-GB TissueNet zarr
archive is no longer required (pass zarr_path= / set
DEEPCELL_TYPES_ZARR_PATH only if you need it). Verified identical
predictions with vs. without the archive on the paper checkpoint.
Post-hoc abstention on by default (ct_abstention_k=0.2), bucketed
per-FOV everywhere (CLI, Python API, library): cells below an IQR fence on
the FOV confidence distribution are relabeled to the "Unknown" sentinel
(skipped when k is disabled or the FOV has <4 cells).
Custom preprocessing hook: predict(..., preprocess=...) overrides the
per-FOV normalization without retraining, backed by a bounded op library
(apply_config, make_preprocessor, DEFAULT_CONFIG) and a
composition-guided adaptation loop (skills/preproc-adapt/).
The bright-spot clip percentile (DCTConfig.PERCENTILE_THRESHOLD) is now
99.9, matching the recipe the training archive was built with (was 99.0,
a carryover from the original packaging).
predict(return_probabilities=True) returns a PredictionResult dataclass
with the full per-cell softmax matrix, cell indices, and the pre-abstention
argmax labels (cell_types_raw).
_torch_load_weights loads with weights_only=True and emits a loud warning
if it has to fall back to unsafe pickle on an older torch; a missing
checkpoint raises a clear FileNotFoundError pointing at download_model().

New public API

predict, DCTConfig, PredictionResult, preprocess_fov, apply_config,
make_preprocessor, and DEFAULT_CONFIG are importable from deepcell_types
directly. preprocess_fov(raw, mask, native_mpp, channel_names) → PreprocessedFov is the standalone preprocessing entry point.

Monorepo: training pipeline

deepcell_types.training ships from this repo behind pip install "deepcell-types[train]": config.py, dataset.py, archive.py,
annotations.py, baseline_features.py, gold_metadata.py, losses.py,
metrics.py, patch.py, utils.py, abstention.py.
Scripts under scripts/: train.py, pretrain.py, predict.py,
generate_openai_embeddings.py, generate_splits.py, split_val_for_test.py,
plus the release-archive gate (validate_archive_contract.py,
check_release_archive.sh).
Canonical split manifests committed under splits/
(fov_split{,_valsubset,_test}.json + README), so the published
train/val/test partition is reproducible from the repo.
Experiment logging is plain Python logging — no Weights & Biases dependency
anywhere (--enable_wandb is gone; confusion matrices save locally as PNGs).
zarr>=3.1 pulls the Python floor up to 3.11 for the train extra.

Baselines

Four paper comparison baselines vendored under deepcell_types/baselines/
(cellsighter, maps, nimbus, xgb), invoked through the unified runner
python -m deepcell_types.baselines <name>, each with a self-contained
install extra (baseline-cellsighter, baseline-maps, baseline-nimbus,
baseline-xgboost).
Each baseline ships a README documenting every deviation from its upstream
source; third-party licenses are tracked in deepcell_types/baselines/NOTICE.
extract_features_from_zarr(missing_value=...) lets each baseline choose its
absent-marker sentinel: MAPS / CellSighter keep 0.0; XGBoost can pass
np.nan so absent markers route through XGBoost's learned missing direction
instead of being conflated with "present, intensity 0.0". The feature matrix
records a present_markers mask and the cache stays missing-value-agnostic.

Breaking changes

CellTypeCLIPModel removed. No shim — use from deepcell_types import predict, DCTConfig.
All predict() arguments after mpp are keyword-only, preventing
accidental transposition of the adjacent string arguments. device= is the
preferred spelling (device_num= remains a deprecated alias).
predict(num_workers=...) default is now 0 (was 24) — 24 workers
OOM'd machines with <64 GB RAM.
Abstention on by default changes returned labels vs. the unfiltered argmax
of prior releases; pass ct_abstention_k=0 to recover raw argmax.
Clip percentile 99.0 → 99.9 shifts ~5% of predicted labels; on a
held-out test-split sample it reproduces the canonical predictions slightly
better (92.5% vs 91.9% argmax agreement).

Packaging / infra

Package data now ships vocab.json, channel_mapping.yaml, and
training/config/*.yaml (incl. combined_celltypes.yaml), which were
previously outside the package tree and absent after pip install.
tifffile declared in the [train] extra.
CI workflow added (.github/workflows/ci.yml); inference vs. [train] test
boundary enforced.
LICENSE text matches the OSI Apache 2.0 text exactly (LIC: Revert licence text to exactly match OSI Apache 2 #42); NOTICE
aligned to the vanvalenlab convention.

Tests

35 test modules under tests/ (plus tests/baselines/) covering canonical
inference, abstention CLI, checkpoint round-trip, dataset/split/sampler
behavior, preprocessing + the preprocess hook, losses, hierarchical eval,
archive-contract validation, baseline feature splits, and vendored-baseline
equivalence against upstream.

See CHANGELOG.md
for the full 0.1.0 entry and migration notes.

Two surviving issues from the cross-repo audit (deepcelltypes-cell-type- assignment-pytorch reviews/2026-05-10-0850/deepcell-types/SYNTHESIS.md) that PR #1 ("feat: support canonical annotator inference") did not address. The other 3 findings (channel KeyError fallback, marker-embedding always- normalize, marker_embeddings allocation shape) are already fixed on this branch. predict.py: - `_torch_load_weights` previously caught `TypeError` from a too-old torch and silently fell back to unsafe pickle deserialization. Now emits a loud warning when the fallback fires, recommending an upgrade. Untrusted checkpoints can execute arbitrary code at unsafe `torch.load` time, so this fallback should be the rare exception, not silent. model.py: - Legacy `CellTypeDataEncoder.forward` (used for the older CLIP checkpoints via the `_is_canonical_checkpoint() == False` route) had: aug_mask = nn.functional.pad(mask.long(), (1, 0), mode="reflect") which prepends a copy of the channel-0 mask bit into the CLS slot. This is correct only when channel 0 is always real (not padding). Replace with explicit `torch.cat([torch.zeros(B, 1, dtype=bool), mask], dim=1)` to make CLS-always-visible the structural invariant. The canonical `annotator_model.py` already uses this pattern (line 409-410); this brings legacy parity. Smoke test: `CellTypeDataEncoder(...)` constructs and forwards without error. No regression risk for canonical-checkpoint loads (those go through `annotator_model.py`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In preparation for merging the training pipeline (currently in a separate repo) into this package, collapse to a single supported architecture. The legacy `CellTypeCLIPModel` path and its DCTConfig "legacy" profile were carrying ~1.8k lines of config blobs and dual-mode branching that would otherwise have to be ported into the training side as well. Removes: - `model.py` (CellTypeCLIPModel) and `loss.py` (CLIP/contrastive losses) - `dct_kit/utils.py` (all four helpers had no remaining callers) - 8 dead config blobs in `dct_kit/config/` — both deepseek-r1 and text-embedding-3-large JSON dumps, plus the legacy and (already-dead) `canonical_*.yaml` mirrors and the `tissue_celltype_mapping_merged` YAML Simplifies: - `predict.py`: drop `_is_canonical_checkpoint` routing, the legacy model/dataloader branches, and `_load_legacy_embeddings` - `dct_kit/config.py::DCTConfig`: remove the `profile=` kwarg, the legacy package-bundled init path, and the embedding-loader methods (`get_channel_embedding`, `get_celltype_embedding`) - `dataset.py::PatchDataset`: drop the `output_mode` parameter and the legacy `_combine_masks` / `_pad_images` / `_calcualte_marker_positivity` helpers — every batch is now canonical - `tests/test_canonical_inference.py`: drop the two legacy-arm tests; the remaining 6 unit tests still pass - `docs/index.md`: trim the legacy `master_channels.yaml` reference from the Limitations section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

With the legacy CLIP model.py removed in the previous commit, the canonical CellTypeAnnotator can reclaim the obvious filename. Updates the two import sites (predict.py, tests/test_canonical_inference.py) to match. This also lines the import path up with the training repo (deepcelltypes-cell-type-assignment-pytorch), which has been using `deepcelltypes.model.CellTypeAnnotator` all along — easing the upcoming training-pipeline merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets up the structure for absorbing the training pipeline currently maintained in the deepcelltypes-cell-type-assignment-pytorch repo, while preserving the lean inference-only install that today's users rely on. - New empty package ``deepcell_types.training`` with an explanatory docstring; will be populated in subsequent phases (losses, dataset, annotations, ...). - pyproject extras: - ``train`` — wandb / zarr (pinned >=3.1, <4 per the alpha metadata-cache bug) / torchvision / torchinfo / torchmetrics / pandas / scikit-learn / click / matplotlib - ``baselines`` — xgboost / optuna - ``analysis`` — plotly / seaborn / openpyxl / kaleido (pinned to skip the broken 0.2.1.post1) - ``all`` — fan-in convenience target - Mirrored the [tool.pytest.ini_options] block from the training repo. CI guard: tests/test_inference_deps.py imports the inference entry points in a fresh subprocess and asserts that none of {wandb, zarr, sklearn, pandas, torchvision, torchinfo, torchmetrics, matplotlib} ends up in sys.modules. Future leaks from the training side into the inference path will fail this test loudly. Subprocess isolation prevents pytest's own imports from poisoning the check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copies three self-contained modules from the training repo (deepcelltypes-cell-type-assignment-pytorch) into ``deepcell_types/training/``: - ``losses.py``: FocalLoss (referenced from upstream pytorch-multi-class- focal-loss) and the dormant HierarchicalLoss (coarse-grained CT loss driven by a YAML fine→coarse mapping). HierarchicalLoss is kept ``weight=0`` in the canonical recipe but is part of the released training surface area for follow-on experiments. - ``annotations.py``: zarr-archive annotation extraction with KDTree centroid matching and the duplicate-label collapse / conflict-drop semantics the training pipeline depends on. Lazy-imports scipy and numcodecs so it stays cheap to import. - ``gold_metadata.py``: Pan-M Gold-Standard subset → (tissue, modality) canonicalization, including the non-direct mappings (decidua → uterus, Vectra/Opal → cycif) used at evaluation time. All three have zero cross-imports into the training-side ``config.py`` or ``utils.py``, so they land cleanly without waiting on Phase 6's config reconciliation. The remaining training surfaces with config dependencies — FullImageDataset, FOVGroupedSampler, augmentations, create_dataloader, and the training portion of utils.py — are deferred to Phase 6. The CI guard (tests/test_inference_deps.py) still passes: importing ``deepcell_types.predict`` does not transitively reach ``deepcell_types.training``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Migrates the canonical raw-FOV → archive preprocessing recipe from the training repo (deepcelltypes-cell-type-assignment-pytorch:preprocessing.py) and promotes it to the top-level public API. The function is the single source of truth for transforming an ingested raw FOV (``(C, H, W)`` intensity at a native MPP) into the format the model consumes: 1. resample to ``TissueNetConfig.STANDARD_MPP_RESOLUTION`` (0.5 µm/px) 2. per-channel p99.9 clip (over nonzero pixels, matching the recovered production recipe from ``hubmap-to-zarr@origin/deepcell-types:preprocess_for_training.py``) 3. per-channel min-max normalize to [0, 1] 4. cast mask to uint32 and compute centroids in resampled coordinates Lives at the top level (``deepcell_types/preprocessing.py``), not under ``training/``, because public inference users need it too — running ``predict()`` against an arbitrary FOV requires this exact preprocessing upstream. Re-exports the function from ``deepcell_types.__init__`` so ``from deepcell_types import preprocess_fov`` works. Only numpy + skimage dependencies (both already in the base install) — the inference-deps guard still passes. The snapshot test from the training repo (``tests/test_preprocessing.py::test_snapshot_against_production``) will follow in Phase 9 when ``tests/`` is migrated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copies the training-side configuration module from B (deepcelltypes-cell-type-assignment-pytorch:config.py) into ``deepcell_types/training/config.py`` verbatim. B is the up-to-date source per the canonical-only-monorepo merge directive. The migrated surface (1343 lines) includes: - ``TissueNetConfig`` — heavy training-side config that opens the zarr archive directly (vs the inference-side ``DCTConfig`` which reads root attrs via JSON). Exposes ct2idx, marker2idx, domain2idx, tissue2idx, dataset_keys, tumor_datasets, tissue_celltype_mapping, domain_mapping, celltype_mapping, marker_positivity, plus the lazy per-dataset MP DataFrame loader. - ``LazyMarkerPositivityDict`` — dict-shaped lazy loader for per-dataset marker-positivity DataFrames; avoids walking ~1.9k datasets at init when only ~285 carry MP data. - ``archive_metadata_fingerprint`` / ``cached_archive_metadata_fingerprint`` / ``archive_array_fingerprint`` — stable hashes used to invalidate cell-data and baseline-feature caches after in-place archive repairs. - ``_discover_fov_keys`` — detects v7 (flat) vs v8 (5-level ``modality/tissue/cohort/sample/fov``) layouts and returns slash-joined leaf FOV keys that zarr and the filesystem both resolve. - ``_patch_zarr_v3_alpha_metadata`` — workaround for zarr 3.0.0a* metadata-cache bugs; the ``[train]`` extra pins zarr>=3.1 to avoid needing this in fresh installs, but the patch stays for dev envs still on the alpha. - ``extract_patch`` / ``extract_patch_from_zarr`` / ``compute_distance_transform`` — patch-extraction utilities consumed by training/dataset.py (next commit). Zero ``deepcelltypes.*`` cross-imports — the file is fully self contained at the package level, so the copy lands without rewiring. Inference-deps guard still passes: ``deepcell_types.predict`` does not transitively reach ``deepcell_types.training.config``, so zarr and pandas stay out of the base install. The DCTConfig (inference) vs TissueNetConfig (training) behavioral audit follows in a subsequent commit — the merge directive says B wins where they diverge, and a couple of spots need aligning (the domain2idx derivation, the ct2idx defensive casting). For now the two coexist and the inference path is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copies B's dataset.py (1706 lines) to ``deepcell_types/training/dataset.py``. Drop-in migration — B's relative imports (``from .config import ...``, ``from .annotations import ...``) resolve cleanly inside the new ``training/`` package since config.py and annotations.py already live there from the previous commits. Brings over: - ``FullImageDataset`` — the canonical zarr-backed training Dataset. Returns ``(C_max, 1, H, W)`` raw*self_mask, ``(3, H, W)`` spatial context (self_mask, neighbor_mask, distance_transform), and the "?" marker-positivity validity mask required by the marker positivity loss. - ``AugmentedDataset`` + ``DropOutChannels`` — train-time augmentations (horizontal/vertical flips, random channel dropout). - ``FOVGroupedSampler`` — keeps samples from the same FOV together within a batch to amortize zarr open cost / preserve neighborhood context. - ``create_fov_splits`` / ``save_fov_splits`` / ``load_fov_splits`` — stratified-by-modality FOV partitioning with sole-source detection so a single-FOV cell type stays in one split. - ``compute_sample_weights`` — class-balanced sampling weights. - ``create_dataloader`` — top-level factory the training scripts call. Inline ``_Compose`` / ``_RandomHorizontalFlip`` / ``_RandomVerticalFlip`` are intentional re-implementations of the torchvision transforms; B uses them to avoid a hard torchvision import at module-load time (the [train] extra pulls torchvision in, but the pattern matches the rest of the package's lazy-deps discipline). Inference-deps guard still green: importing deepcell_types.predict does not transitively reach deepcell_types.training.dataset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copies B's utils.py (1462 lines) to ``deepcell_types/training/utils.py`` verbatim and rewrites three lazy package imports from ``from deepcelltypes.X`` to ``from deepcell_types.training.X``. Surface migrated: - ``BatchData`` — dataclass collecting per-batch inputs (sample, spatial_context, ch_idx, padding_mask, marker_pos_mask, ct_label, domain_label, dataset_name, fov_name, cell_index, ...). - ``LossesAndMetrics`` — per-epoch loss and metric accumulator. - ``MPMetricsTracker`` — marker-positivity per-marker counters and threshold sweeps. - ``PredLogger`` — atomic-write CSV predictions logger (5-field: labels, probs, cell_index, dataset_name, fov_name). Name collides with ``deepcell_types.predict.PredLogger`` but they live in different namespaces — A's is a 2-field inference result buffer with a different interface. Leaving both: B is the training-authoritative version, A's stays as the inference API contract. - ``get_tissue_ct_exclude`` — per-sample tissue/dataset-aware ct exclusion list builder for training-time masking. Different function from A's ``_excluded_celltype_indices`` (which is the per-tissue public-API affordance for ``predict(tissue_exclude=...)``); both retained. - Seed / dataloader hygiene: ``seed_everything``, ``worker_init_fn``, ``make_generator``. - Label compaction: ``build_label_remap``, ``adjust_conf_mat_hierarchy``. - Wandb logging: ``log_epoch_metrics``, ``log_confusion_matrix`` (wandb is a lazy import inside the functions; module load works without it). - Feature extraction: ``extract_features_from_zarr``, ``_extract_all_dataset_features``, ``compute_baseline_metrics``, ``save_baseline_predictions``. - Atomic file utilities and a cache-metadata fingerprint helper used by the cell-data caching layer. Inference-deps guard still green: importing ``deepcell_types.predict`` does not transitively reach ``deepcell_types.training.utils`` (pandas, the heaviest dep here, stays out of the base install). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copies B's abstention.py into ``deepcell_types/training/abstention.py``. Placement decision: training-side, not top-level public API. The ``apply_abstention`` function takes a ``pandas.DataFrame`` as its input, so promoting the module to ``deepcell_types/abstention.py`` would either force pandas into the inference base install (breaks Phase 3's [train] dep split) or require a pandas-to-numpy refactor of the public surface. Neither is justified now — the module's existing callers in B all live in scripts and notebooks that will be moved under training-side surfaces in Phase 9. If the public release wants abstention as a first-class inference feature later, two options: (a) refactor apply_abstention to take arrays + group keys instead of a DataFrame and move it to deepcell_types/abstention.py; (b) accept pandas in the inference deps and move both the file and the [train] guard. Defer to Phase 10. Updates one internal docstring reference from ``deepcelltypes.utils.adjust_conf_mat_hierarchy`` to the new ``deepcell_types.training.utils.adjust_conf_mat_hierarchy`` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Intended to land in the previous commit but the Edit was rejected because the file hadn't been Read in-session yet. Updates the ``hierarchical_correct`` docstring's cross-reference from ``deepcelltypes.utils.adjust_conf_mat_hierarchy`` to the migrated ``deepcell_types.training.utils.adjust_conf_mat_hierarchy`` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Final bulk migration from deepcelltypes-cell-type-assignment-pytorch (B) into the canonical-only monorepo: - ``scripts/`` (16 entry points, 528KB): train.py, pretrain.py, predict.py (script form, distinct from the library ``deepcell_types.predict``), benchmark_gold_standard.py, run_gold_standard_nimbus.py, generate_openai_embeddings{,_v8}.py, generate_manifest_index.py, generate_splits.py, ingest_gold_to_zarr.py, split_val_for_test.py, refine_mp_labels_with_intensity_v2.py, validate_archive_contract.py, _combine_v3_and_C.py, download_gold_standard.sh. - ``tests/`` (22 new test modules merged with A's existing ``test_canonical_inference.py`` and ``test_inference_deps.py``). No filename collisions; the A tests stayed in place. - ``config/combined_celltypes.yaml`` — small (~couple KB) cell-type group taxonomy used by TissueNetConfig.combined_celltype_mapping. Skipped the 30MB ``marker_embeddings-deepseek-r1-70b.json`` (training artifact, not part of the public release surface — users regenerate via scripts/generate_openai_embeddings.py). Import rewrites applied via sed across the migrated files AND across ``deepcell_types/training/`` itself (caught two lazy ``from deepcelltypes.utils import ...`` imports inside training/dataset.py that the verbatim copy preserved): deepcelltypes.model -> deepcell_types.model (top-level) deepcelltypes.preprocessing -> deepcell_types.preprocessing (top-level) deepcelltypes.abstention -> deepcell_types.training.abstention deepcelltypes.annotations -> deepcell_types.training.annotations deepcelltypes.config -> deepcell_types.training.config deepcelltypes.dataset -> deepcell_types.training.dataset deepcelltypes.losses -> deepcell_types.training.losses deepcelltypes.utils -> deepcell_types.training.utils deepcelltypes.gold_metadata -> deepcell_types.training.gold_metadata from deepcelltypes import -> from deepcell_types.training import Path fix in training/config.py: ``CONFIG_DIR`` now resolves three parents up (deepcell_types/training/config.py -> deepcell_types/training/ -> deepcell_types/ -> repo root -> config/), one ``.parent`` deeper than B's original two-segment ``deepcelltypes/`` layout. Test results: 239/245 passing. The 6 failures are all env/data dependent, not migration bugs: - 4 × test_v2.py::TestLossesAndMetricsCompute — needs torchmetrics (in the [train] extra, not in the base install). Pass when [train] is installed. - 1 × test_preprocessing.py::test_snapshot_against_production — needs the production zarr archive at PRODUCTION_ARCHIVE AND zarr>=3.1 (the [train] pin); dev env has zarr 3.0.0a5 and no archive. - 1 × test_refine_mp_labels_v2.py::test_stage7_synthetic_gold_validation — imports from ``analysis/`` which was explicitly deferred from this migration (research cruft triage is a separate exercise). Deferred from this migration: ``output/`` (62GB), ``models/`` (48GB), ``features/`` (8.9GB), ``baselines/`` (7.3GB), ``data/`` (4.9GB), ``wandb_tmp/``, ``embeddings/``, ``figures/``, ``splits/``, ``logs/``, ``analysis/`` (~400KB), ``experiments/`` (~400KB). The big ones are training artifacts that should never be in git regardless; the small ones (analysis/, experiments/) are research code to triage separately before deciding whether they belong in the public release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings the four baseline comparison repos into A as submodules, matching B's layout under ``baselines/``. Each sub-repo is owned at github.com/xuefei-wang/deepcelltypes-<name>.git and is now tracked on ``main``. Pre-flight: in each baseline repo, ``paper-faithfulness-alignment`` was 2-4 commits ahead of ``main`` with 0 behind. Fast-forwarded ``main`` to ``paper-faithfulness-alignment`` and pushed for each repo before adding the submodule here, so A's pin lands on the same commit that B's pin pointed to: cellsighter 79c79aa..e8c078d (paper -> main FF) maps c50c0eb..5b59f46 nimbus f3f65e9..9bfe11d xgboost e4db5ed..b227380 A's submodule branch tracking points at ``main`` for all four; ``paper-faithfulness-alignment`` remains as a historical reference in each sub-repo but is no longer the active branch. The baseline source code is small (~200KB total tracked); B's local ``baselines/maps`` working tree had ~7GB of model artifacts that are not git-tracked and stay in B. The fresh clone in A contains only the tracked source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses the two BLOCKER findings from the deep-review: 1. ``tifffile`` is a top-level import in ``scripts/ingest_gold_to_zarr.py`` (and lazy at 4 other scripts) but was absent from ``pyproject.toml``. Any clean ``pip install deepcell-types[train]`` could not run the ingest pipeline. Added to the ``[train]`` extra. 2. ``training/config.py::CONFIG_DIR`` resolved three ``.parent``s up to a repo-root ``config/`` directory that does not exist after ``pip install`` (it would land at ``site-packages/config/``). The YAML file ``combined_celltypes.yaml`` therefore was unreachable from any non-editable install, and ``combined_celltype_mapping`` silently returned ``{}`` — group-level cell-type logic invisibly broke for installed users. Fix: move ``config/combined_celltypes.yaml`` into the package at ``deepcell_types/training/config/combined_celltypes.yaml``, shorten ``CONFIG_DIR`` to ``Path(__file__).parent / "config"``, and extend ``[tool.setuptools.package-data]`` to include ``training/config/*.yaml`` so the wheel actually ships the file. Verified: ``yaml.safe_load(CONFIG_DIR / "combined_celltypes.yaml")`` loads 48 entries from the new location. Test suite after fix: 115 passed, 1 skipped, 1 fail. The single failure is ``test_snapshot_against_production`` which needs zarr>=3.1 (``[train]`` extra pins it) plus the production archive available at ``$PRODUCTION_ARCHIVE_PATH`` — pre-existing env-dependent skip, not introduced by this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Themes addressed in one batch (see reviews/2026-05-10-2345/SYNTHESIS.md): - J (errors H1): mp_macro_precision/recall used np.mean over arrays containing np.nan for vacuous markers — poisoning wandb dashboards with NaN every epoch. Switch to np.nanmean with an all-NaN guard, matching the existing macro_f1 treatment. - A (API/simplification H3): rename predict.PredLogger to _InferenceResultBuffer (private) to remove the collision with the richer training-side PredLogger; same-name, incompatible-signature classes were a future-bug magnet. - B (API/perf H2): num_workers default 24 → 0 in predict(). The doc string already warned "only safe with >64 GB RAM"; 24 workers each hold a full FOV in-memory and re-run preprocessing. - C (multiple): drop stale deepcelltypes-kit fallback paths in get_channel_embedding / get_celltype_embedding (path didn't exist post-merge → silent {} return); rewrite the training/config.py module docstring; fix docs/site/API-key.md broken "from utils import download_training_data" import. - D (API H1, M4): TissueNetConfig default zarr_path is now None with DEEPCELL_TYPES_ZARR_PATH env-var fallback (was hard-coded /data2/... NFS path). Fix create_model docstring to name DCTConfig. - I (errors H2/H3, M1, M5): narrow three broad `except Exception` blocks in dataset.py (_load_tissuenet_archive: cache build, modality attr, tissue attr) to (KeyError, AttributeError, TypeError, ValueError, OSError, json.JSONDecodeError, GroupNotFoundError). Add a >1% drop-rate guard so schema regressions can no longer silently lose hundreds of datasets. Narrow zarr-v3-alpha shim except to ImportError. Catch UnicodeDecodeError in _read_dataset_metadata. Also removed two vestigial "see MEMORY.md" cross-references in LossesAndMetrics warning text (MEMORY.md never existed in this repo). Tests: 243 passed, 1 skipped, 1 pre-existing env-dependent failure (test_stage7_synthetic_gold_validation needs analysis/ on path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

From the deep review's tests.md HIGH findings + the API/tests M1 cross-config agreement gap: - _build_model n_markers mismatch → ValueError - _build_model n_celltypes mismatch → ValueError - _excluded_celltype_indices on unknown tissue → ValueError - _excluded_celltype_indices positive case: returned exclusion rows contain every non-allowed index for the tissue - _excluded_celltype_indices(tissue=None) passthrough - PatchDataset with channel_names matching nothing → ValueError ("No input channels matched") - DCTConfig and TissueNetConfig built from the same archive must agree on MAX_NUM_CHANNELS, CROP_SIZE, STANDARD_MPP_RESOLUTION, marker2idx, ct2idx (importorskip("zarr") gates the test on the training extra). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

From reviews/2026-05-10-2345/simplification.md H1+H2 and complexity.md H2: - Delete _zarr_group_filesystem_path and _read_v3_1d_array from training/utils.py. Both were verbatim copies of annotations.py's group_filesystem_path / read_v3_1d_array with zero callers across the repo (verified by grep). The annotations.py versions are the canonical ones imported by training/dataset.py. - Delete the three pass-through static shim methods on FullImageDataset (_group_filesystem_path, _read_v3_1d_array, _centroid_to_cell_idx_fast). None were called anywhere — adding zero value, only obscuring that the real helpers live in annotations.py. Note: _build_centroid_tree is kept (also flagged but not in the HIGH list). - Backport the zstd-level-aware codec read from dct_kit/config.py into annotations.py:read_v3_1d_array. The old training-side copy hardcoded Zstd(level=0) while the inference side correctly reads level from the codec config. With archives written at a non-zero compression level the training-side read would silently produce garbage. Both paths now share the level-aware contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…es (Theme F) config.py and utils.py had grown to 1.3k and 1.5k LOC, mixing archive fingerprinting, patch extraction, metric trackers, baseline IO, and the core TissueNetConfig/RNG/log helpers in one place each. Carve four focused modules out (verbatim, no logic changes): - training/archive.py: zarr v3 alpha metadata patch, archive metadata / array fingerprinting, FOV-key discovery, and the per-process caches. - training/patch.py: per-cell patch extraction (compute_distance_transform, extract_patch_from_zarr, extract_patch). - training/metrics.py: confusion-matrix hierarchy adjustment, MP per-marker reduction, MPMetricsTracker, LossesAndMetrics, build_label_remap. - training/baseline_features.py: baseline classifier feature extraction pipeline (_conf_mat_summary, compute_baseline_metrics, save_baseline_predictions, _extract_all_dataset_features, extract_features_from_zarr, _get_cell_data_from_ds). Re-exports at the bottom of config.py and utils.py keep all tests/scripts working unchanged (230 passed, 1 skipped, matching the pre-split baseline). dataset.py is updated to import directly from the new homes for cached_archive_metadata_fingerprint and extract_patch. Two non-mechanical touches required to keep monkey-patch-based tests green: - baseline_features.extract_features_from_zarr looks up _discover_fov_keys and _extract_all_dataset_features via the config / utils modules at call time, so tests that monkeypatch those symbols on the legacy modules still take effect after the split. _FINGERPRINT_CACHE / _FOV_KEYS_CACHE dicts are re-exported from config.py for the same reason (test_dataset_cache mutates them). - metrics.LossesAndMetrics.compute defers import of _conf_mat_summary to method-call time to avoid a metrics <-> baseline_features import cycle (baseline_features needs adjust_conf_mat_hierarchy at module load). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

From reviews/2026-05-10-2345/docs.md HIGH findings: - README: add a "Training" section describing the [train] extra and the four main entry points under scripts/. Move "Download the model" after "Installation" (was non-executable in reading order). - docs/index.md: add a "Training" section explaining that training-only code lives under deepcell_types.training, gated behind the [train] extra, with pointers to scripts/{train,predict,pretrain, benchmark_gold_standard,ingest_gold_to_zarr}.py. Fix the long-standing "sorce" typo. - docs/site/tutorial.md: bump the example archive placeholder from tissuenet-v8.zarr → tissuenet-v9.zarr to match DCTConfig's probe order (v9 is the canonical contemporary archive). The docs.md HIGH for the broken `from utils import download_training_data` import in docs/site/API-key.md was fixed in 88b95f9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five MEDIUM/HIGH findings from reviews/2026-05-10-2345 in one batch: - complexity H1: TissueNetConfig.get_marker_positivity() and marker_positivity_labels[] now share a single LazyMarkerPositivityDict. Previously the plain-dict cache populated by get_marker_positivity() was discarded the first time marker_positivity_labels was accessed (the property replaced the field), causing wasted I/O and divergent caches. _marker_positivity_cache is now Optional[LazyMP...] and lazily constructed on first access; get_marker_positivity routes through marker_positivity_labels for a single source of truth. - numerical M1: MarkerEmbeddingLayer.forward zeros output for padding positions (ch_idx == -1). Without this, F.normalize(proj(0)) yielded a unit-norm direction equal to F.normalize(proj.bias) — a non-trivial embedding flowing into the transformer for tokens that should be invisible. - numerical M2: CellTypeAnnotator.forward zeros spatial features for padding positions BEFORE the fusion concat. Otherwise padding tokens enter self.fusion with [0, spatial_feat] and emerge as W_spatial @ spatial_feat + bias. - API M1: rename predict(tissue_exclude=...) → predict(tissue_filter=...). The old name was inverted — "tissue_exclude='colon'" actually meant "filter TO colon-associated cell types". The deprecated alias stays (keyword-only) and emits DeprecationWarning; passing both raises TypeError. - API M3: predict(return_probabilities=True) returns a PredictionResult dataclass with cell_types, probabilities (full per- cell softmax matrix), and cell_indices. Default behaviour unchanged (returns list[str]). PredictionResult and DCTConfig are now hoisted to top-level so `from deepcell_types import PredictionResult, DCTConfig` works. Tests: 233 passed, 1 skipped. Added 3 new tests covering return_probabilities, tissue_exclude DeprecationWarning, and the both-args TypeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- tests M3: add a regression anchor in test_train_loop_smoke.py that asserts scripts/train.py still contains the AMP scheduler-gate predicate. The 2-line _run_gated_step helper is faithful to the production behavior but a silent drift would otherwise let the emulator tests pass while real training desynchronizes OneCycleLR. - tests M2: same idea for test_zero_channel_masking.py. The unit-test helper is a verbatim copy of __getitem__'s masking block; a refactor could let the copy drift. New test asserts training/dataset.py still contains _zero_channel_cache and fov_zero_mask. - docs M4: add CHANGELOG.md documenting the 0.0.1 → 0.1.0 release (canonical-only refactor, training subpackage, breaking removal of CellTypeCLIPModel, deprecated tissue_exclude alias, num_workers=0 default, TissueNetConfig env-var default). Bump version in pyproject.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

complexity H8: replace FullImageDataset.indices' positional 8-tuple with a CellIndexRecord NamedTuple. Named fields make grep / refactor safe (no more record[6] / record[5] magic numbers across 10+ call sites). NamedTuple IS a tuple, so positional access still works for backward compat with serialized caches that stored raw 8-tuples. Production call sites in dataset.py now use .ct_label_standard, .dataset_name, .fov_name, .ds_idx, .domain accessors. Mock-index constructors in tests/{test_v2,test_samplers,test_stratified_splits, test_dataset_splits}.py updated to build CellIndexRecord instances. complexity H7: introduce DataLoaderConfig dataclass + matching create_dataloader_from_config(zarr_dir, dct_config, cfg) wrapper. Lets new callers pass a single discoverable object instead of 20+ keyword arguments. The legacy keyword signature of create_dataloader is preserved verbatim so train.py / predict.py / tests don't need any change. Field defaults mirror create_dataloader's defaults exactly — DataLoaderConfig() is equivalent to no-override. Tests: 235 passed, 1 skipped (analysis-only env failure unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Collapse training pipeline into deepcell-types (canonical-only)

…ne submodule rebase Three independent bugs surfaced when running training against the current master HEAD from a fresh workspace install: 1. tissue_idx kwarg mismatch (scripts/train.py:121, scripts/predict.py:208 + 334) scripts pass `tissue_idx=batch_data.tissue_idx` to `CellTypeAnnotator.forward(...)`, but the model's forward signature is `(sample, spatial_context, ch_idx, padding_mask, ct_exclude=None, return_attn_weights=False, domain_idx=None)` — no `tissue_idx`. The tissue-FiLM MP head experiment was rolled back (see memory `v10_mp_expansion_tissue_negative.md`) and the model dropped the parameter, but the scripts kept passing it. Result: every training / prediction run dies at the first forward pass with `TypeError: ...got an unexpected keyword argument 'tissue_idx'`. Fix: drop the kwarg at all three call sites. `batch_data.tissue_idx` is still populated by the dataloader and remains available to anyone who needs it downstream — the model just doesn't consume it. 2. Circular import between training/utils.py and training/baseline_features.py utils.py re-exports four symbols from baseline_features.py at module level for backward compat. baseline_features.py also imports private helpers (`_atomic_np_savez` etc.) from utils.py. When utils.py is imported first (training path) the cycle resolves fine, but when baseline_features.py is imported first (baseline path — e.g. `import xgb.run`), the partially-initialized utils.py reaches back to `baseline_features._extract_all_dataset_features` before that name is defined, and ImportError fires. Fix: convert the re-exports to a module-level `__getattr__` so the lookup is deferred until actual access, by which point both modules have finished initializing. Existing callers (`from deepcell_types.training.utils import save_baseline_predictions`, verified in tests/test_v2.py) keep working. 3. Submodule rebase (baselines/{maps,cellsighter,xgboost,nimbus}) Each baseline's pyproject.toml listed `deepcelltypes @ git+... deepcelltypes-cell-type-assignment-pytorch.git` as a dep; that URL now resolves to the renamed research workspace (no longer a Python package) and `uv pip install` fails with a metadata-name mismatch. Each baseline also imported from `deepcelltypes.{config,utils,dataset}` — the pre-refactor flat layout. Companion commits on each submodule's `fix/post-refactor-imports` branch replace the dep URL with a plain `deepcell-types` and rebase imports onto `deepcell_types.training.{config,utils,dataset,metrics,baseline_features}`. This parent commit bumps the submodule pointers to those branch tips. End-to-end verification: with the three fixes, a fresh workspace `uv sync` + smoke training (`scripts/train.py` with the v10 split + svd_512_v6 embeddings) gets through model build, GPU allocation, and reaches batch 0 of epoch 0. The xgboost baseline imports cleanly after `uv pip install -e baselines/xgboost`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…port fix(train,predict,utils): tissue_idx kwarg + circular import + baseline submodule rebase

uv.lock is regenerated on every branch switch when pyproject.toml shapes differ (master vs training), so keeping it tracked produces constant churn. /reviews/ holds local /deep-review outputs.

Untouched since Oct 2024 and broken since the kit was inlined in 0a8108e: it COPYs a non-existent top-level requirements.txt and pip-installs the deleted deepcelltypes-kit/ directory. No CI, docs, or scripts reference it.

…mmit -> CSV) Pins each reported test number to its checkpoint (sha256), the train/eval code commit (d13fd54 for the MLP-head run, b598710 for the resMLP run; both ancestors of PR #41), and the prediction CSV (sha256). Notes the self-pinning gap (configs don't record git_commit) addressed separately on PR #41. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CKPT_CONFIG now stores the code commit the run executed under (git rev-parse HEAD of the checkout owning train.py — the pinned worktree's HEAD when run via a pin), so any checkpoint/result traces to an exact code snapshot. 'unknown' when not a git repo. Closes the traceability gap noted in the recipe-ablation manifest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ty gap note Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e reported set MAPS and CellSighter chose their best checkpoint by evaluating on the same set they then reported (MAPS: data["X_val"] each epoch -> best-by-val-loss; CellSighter: test_loader each epoch -> best-by-macro-accuracy). Selecting on the reported set is leakage. XGBoost was already correct (FOV-grouped inner-val for early stopping). Both baselines now mirror XGBoost: select on a FOV-grouped inner-validation set carved from the TRAIN FOVs (10%), and report once on the untouched test set. - dataloader.create_dataloader: additive, default-off `inner_val_ratio`/`inner_val_seed`. When >0, carves a FOV-grouped inner-val from train_indices, trains only on inner-train, and returns the inner-val loader via metadata["inner_val_loader"]. Default 0.0 leaves the main-model path unchanged. - maps/run.py: GroupShuffleSplit(test_size=0.1) on train FOVs; normalization stats, sampler, and per-epoch val-loss selection all from inner-train/inner-val. - cellsighter/run.py: inner_val_ratio=0.1; selection on metadata["inner_val_loader"]. - READMEs: record the deviation from upstream selection protocol. - test_maps_cellsighter_equivalence: drop the run.py byte-equivalence pin (logic now intentionally deviates from upstream) and replace with a behavioral inner-val check. Consequence: changes published MAPS/CellSighter numbers (now train on ~90% of train cells); requires a full re-run on the v10 archive to regenerate the headline table. Does not touch the abstention asymmetry (scoped out). CellSighter still selects by macro-accuracy (separate finding, left unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

CellSighter selected its best checkpoint by macro-accuracy while the main model (scripts/train.py -> val_macro_f1) and the headline comparison use macro-F1. When accuracy and F1 diverge (systematic in imbalanced multi-class settings) the returned checkpoint was not the macro-F1-optimal one, depressing the reported CellSighter macro-F1. Switch selection (on the held-out inner-val) to macro-F1; update the saved checkpoint key and logging. Reported test metrics unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…isjointness Extract the inner-validation carve from create_dataloader into a module-level _carve_inner_val_fovs helper (behavior-preserving; the default no-op path returns train_indices unchanged) so the leakage-critical FOV-grouping is unit-testable. Add regression tests asserting whole-FOV grouping, train/inner-val disjointness, a clean index partition, the no-op path, and the >=1-inner-train-FOV cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t_strict knobs Thread optional per-loader knobs through extract_patch, FullImageDataset, and create_dataloader, each defaulting to current behavior so DCT/MAPS inputs stay byte-identical: - mask_intensities (default True): when False, return the full crop including neighbor intensities instead of raw*self_mask (single-cell input). - crop_size/output_size (default dct_config): per-loader patch-size override. - train_transform (default H/V flips): custom train-time spatial augmentation. - split_strict (default True): downgrade split fingerprint mismatch to a warning when all split FOVs are present in the current archive. Enables the faithful CellSighter baseline without altering the shared single-cell path. Full suite: 313 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reimplement the CellSighter baseline to follow Amitay et al. (Nat Commun 2023) training recipe inside our cross-tissue harness: - ImageNet ResNet50 stem (7x7/s2 + maxpool) for 60x60 crops; --cifar_stem keeps the 32x32 CIFAR stem for ablation. - Unmasked neighbor intensities (mask_intensities=False); --mask_self ablation restores the single-cell input. - Geometric augmentation module (rotation, vectorized per-channel shift, mask dilation, flips@0.75). Poisson resampling omitted: preprocessed/raw is [0,1] min-max normalized, not photon counts, so Poisson would corrupt the signal. - New flags: --crop_size (default 60), --seed (ensemble diversity), --test_split_file (final eval on a held-out split), --allow_split_mismatch. Re-pin the cellsighter freeze snapshots as a drift guard (no longer an upstream-identical port) and re-freeze its CLI option set. maps stays frozen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n docs b598710 (resMLP-head default) and ef1229f (checkpoint git_commit self-pinning) are independent xuefei/master commits, not part of PR #41. Only d13fd54 is in PR #41's lineage. Correct the three attribution claims in TRACEABILITY.md and REPORT.md; numerical claims unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add an explicit cross-file FOV disjointness check between --split_file ('train') and --test_split_file ('val'): load_fov_splits only checks overlap within a single file, so a mismatched pair could silently leak training FOVs into the reported number. Also warn loudly when --test_split_file is omitted, since the final eval then reuses the checkpoint-selection val loader (selection-on-the-eval-set, not a held-out number). Re-pin the cellsighter run.py drift-guard SHA. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tched-budget uniform sampler

…les_per_epoch flags

… + full-inv-freq) The CellSighter baseline silently inherited DCT's sqrt-inverse-frequency WeightedRandomSampler (1000-count floor) via create_dataloader's default, not the original CellSighter's equal-proportion balancing (research-workspace issue #96). Make the baseline genuinely faithful on the class-balancing axis. Faithfully reproduce KerenLab/CellSighter's recipe: - subsample_indices_per_class: caps the TRAIN pool to <=size_data cells/class (subsample_const_size; paper size_data=1000), deterministic per seed, val/test untouched. - compute_sample_weights_equal: full-inverse-frequency weights weight=total/count (define_sampler with sample_batch=true). Wire a `class_balance` {equal|sqrt|none} + `size_data` knob through create_dataloader; the CellSighter baseline now defaults to the faithful equal-proportion scheme (--class_balance equal --size_data 1000), with sqrt and none as ablations. --no_weighted_sampler kept as a deprecated alias for --class_balance none. Legacy use_weighted_sampler still honored when class_balance is None (main DCT model and other callers unaffected). Docs: README documents the now-faithful default + remaining hierarchy_match deviation. Tests: 6 new unit tests for the weight law + size_data cap; updated the cellsighter option-freeze set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A three-round faithfulness audit of the baselines against their original papers/codebases found the baseline READMEs accurately documented architecture and training mechanics but omitted several data-pipeline deviations. This adds the verified, source-cited disclosures: - CellSighter: neighbor-intensity self-mask (training/patch.py:176) vs upstream's raw neighbor intensities; [0,1] min-max normalization vs upstream raw counts; flips-only augmentation vs upstream's seven; 32x32 context vs 60px; sqrt- vs full-inverse-frequency sampler. Most are shared DeepCell Types preprocessing (fairness-neutral across models) but still deviate from how upstream CellSighter was trained. Self-mask impact was empirically tested (feat/faithful-cellsighter): the ranking did not change. - Nimbus: prediction resize uses INTER_LINEAR vs upstream 0.0.5 INTER_NEAREST; mpp-based rescale vs magnification-ratio. Verified against the installed nimbus-inference==0.0.5 wheel; core primitives (sigmoid, prepare_binary_mask, cross-FOV normalization) confirmed faithful. - XGBoost: no cellSize feature and no class balancing (conservative vs the neural baselines); tuning budget not matched (Optuna only for XGBoost). Documentation only; no code or behavior changes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Remove pre-existing F401 (pandas, typing.Dict/Any, torch.nn.functional) and F541 (placeholder-less f-string) in run.py, surfaced by ruff --fix during the baseline integration. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # README.md # tests/test_preprocess_hook.py

# Conflicts: # README.md

…sole balancer The DCT backbone training now uses the WeightedRandomSampler (compute_sample_weights in dataset.py) as the SOLE rare-class balancer. The redundant per-class FocalLoss alpha weighting is removed entirely (cleaner than an interlock), making double-weighting structurally impossible. The focal term (gamma, via --focal_gamma) is kept unchanged. Concretely: - Delete the compute_class_weights() helper in scripts/train.py (its only caller; MAPS keeps its own separate compute_class_weights in baselines/maps/run.py, untouched). - Delete the call site plus the plumbing that fed it only (the dataset-layout isinstance checks + train_dataset_ref/train_indices unwrapping, and the now -unused AugmentedDataset/FullImageDataset imports). label_remap is retained; it is used elsewhere. - FocalLoss alpha is now hard-coded to None for backbone training. - Remove the --no_class_weights Click flag, its main() parameter, and its "no_class_weights" key from the checkpoint config dict. Removing the config key is safe: extra/missing config keys do not break checkpoint loading. - Update the LossesAndMetrics double-weighting warning in metrics.py to drop the stale --no_class_weights reference. - CHANGELOG: note the schema change and the default change. This makes the no-flags default reproduce the released-checkpoint recipe, which was trained with --no_class_weights. It CHANGES scripts/train.py's no-flag default versus v0.1.0 development builds (which applied class weights by default); the released checkpoint and the stage-2 head retrain (scripts/retrain_head.py, plain CrossEntropyLoss) are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add model version 2026-06-15 (deepcell-types_2026-06-15_resmlp.pt, md5 704616a1...) and set it as _latest, so download_model() / predict.py default to the two-stage residual-MLP head-retrain model (80.27 hier macro-F1 on the held-out 129-FOV test split, vs tuned XGBoost 79.03 and the prior Frozen-CLS 74.20). The resMLP head is auto-detected by predict._build_model via ct_head.inp.0.weight, so no caller change is needed. The prior Frozen-CLS release (2026-05-17) is retained in the registry for reproducibility. NOTE: the asset deepcell-types_2026-06-15_resmlp.pt must be uploaded to users.deepcell.org/models/ before this pin resolves for end users. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…add headline number The pr-31 archive-free-README merge took its README wholesale (--theirs), which reverted master's richer Training section (retrain_head.py stage-2 recipe, evaluate_on_test.sh, the leakage-free-test-split headline sentence) because pr-31 branched before that landed. Restore it as a union (pr-31's archive-free inference sections + master's Training section) and state the headline number: two-stage resMLP 80.27 hierarchical macro-F1 vs tuned XGBoost 79.03. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Track A: infra fixes + drop-class-weighting recipe (consolidates #43,#34,#38,#39,#31,#30,#41)

Promote two-stage resMLP to the headline released model

Recipe-ablation review docs with corrected commit attributions (was #44)

# Conflicts: # deepcell_types/training/dataloader.py

Track B: faithful CellSighter + leakage-free baseline selection (consolidates #35,#33,#42,#37)

xuefei-wang and others added 30 commits April 22, 2026 23:54

feat: support canonical annotator inference

e4656b1

chore: ignore local data, model, and output dirs

0a77ddf

feat(canonical): source inference metadata from archive

0d8f21c

Merge pull request #39 from xuefei-wang/refactor/canonical-only-monorepo

3e178b7

Collapse training pipeline into deepcell-types (canonical-only)

Merge pull request #4 from xuefei-wang/fix/tissue-idx-and-circular-im…

ede32a2

…port fix(train,predict,utils): tissue_idx kwarg + circular import + baseline submodule rebase

chore(gitignore): ignore uv.lock and /reviews/ artifacts

8df93c6

uv.lock is regenerated on every branch switch when pyproject.toml shapes differ (master vs training), so keeping it tracked produces constant churn. /reviews/ holds local /deep-review outputs.

chore: drop orphan Dockerfile

bfdb9ae

Untouched since Oct 2024 and broken since the kit was inlined in 0a8108e: it COPYs a non-existent top-level requirements.txt and pip-installs the deleted deepcelltypes-kit/ directory. No CI, docs, or scripts reference it.

xuefei-wang and others added 30 commits June 17, 2026 13:51

docs(reviews): self-pinning implemented (ef1229f) — update traceabili…

4b8accd

…ty gap note Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

experiment: CellSighter sampler/per-modality-norm ablation flags + ma…

ed92d0c

…tched-budget uniform sampler

fix: reap fit-pass dataloader workers (gc) + --num_workers/--max_samp…

de3cc96

…les_per_epoch flags

Merge pr-43 into integration-infra-v2

4796c2d

Merge pr-34 into integration-infra-v2

de0f5bd

Merge pr-38 into integration-infra-v2

1a2b8fa

Merge pr-39 into integration-infra-v2

7bd0e90

Merge pr-31 into integration-infra-v2

565c74c

# Conflicts: # README.md # tests/test_preprocess_hook.py

Merge pr-30 into integration-infra-v2

34b30d8

# Conflicts: # README.md

Merge pr-41 into integration-infra-v2

682c203

Merge pull request #46 from xuefei-wang/landing/track-a-infra-recipe

41d2baf

Track A: infra fixes + drop-class-weighting recipe (consolidates #43,#34,#38,#39,#31,#30,#41)

Merge pull request #48 from xuefei-wang/feat/release-resmlp-checkpoint

52ac417

Promote two-stage resMLP to the headline released model

Merge pull request #49 from xuefei-wang/landing/pr44-attribution-fix

4a7ac7d

Recipe-ablation review docs with corrected commit attributions (was #44)

Merge remote-tracking branch 'xuefei/master' into HEAD

bf7ff95

# Conflicts: # deepcell_types/training/dataloader.py

Merge pull request #47 from xuefei-wang/landing/track-b-baselines

597e2e2

Track B: faithful CellSighter + leakage-free baseline selection (consolidates #35,#33,#42,#37)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0: monorepo merge — unified training + inference, canonical model, archive-free inference, vendored baselines#41

v0.1.0: monorepo merge — unified training + inference, canonical model, archive-free inference, vendored baselines#41
xuefei-wang wants to merge 228 commits into
vanvalenlab:masterfrom
xuefei-wang:master

xuefei-wang commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xuefei-wang commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Canonical model

Canonical-only inference

New public API

Monorepo: training pipeline

Baselines

Breaking changes

Packaging / infra

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xuefei-wang commented May 30, 2026 •

edited

Loading