v0.1.0: monorepo merge — unified training + inference, canonical model, archive-free inference, vendored baselines#41
Draft
xuefei-wang wants to merge 228 commits into
Draft
Conversation
Two surviving issues from the cross-repo audit (deepcelltypes-cell-type- assignment-pytorch reviews/2026-05-10-0850/deepcell-types/SYNTHESIS.md) that PR #1 ("feat: support canonical annotator inference") did not address. The other 3 findings (channel KeyError fallback, marker-embedding always- normalize, marker_embeddings allocation shape) are already fixed on this branch. predict.py: - `_torch_load_weights` previously caught `TypeError` from a too-old torch and silently fell back to unsafe pickle deserialization. Now emits a loud warning when the fallback fires, recommending an upgrade. Untrusted checkpoints can execute arbitrary code at unsafe `torch.load` time, so this fallback should be the rare exception, not silent. model.py: - Legacy `CellTypeDataEncoder.forward` (used for the older CLIP checkpoints via the `_is_canonical_checkpoint() == False` route) had: aug_mask = nn.functional.pad(mask.long(), (1, 0), mode="reflect") which prepends a copy of the channel-0 mask bit into the CLS slot. This is correct only when channel 0 is always real (not padding). Replace with explicit `torch.cat([torch.zeros(B, 1, dtype=bool), mask], dim=1)` to make CLS-always-visible the structural invariant. The canonical `annotator_model.py` already uses this pattern (line 409-410); this brings legacy parity. Smoke test: `CellTypeDataEncoder(...)` constructs and forwards without error. No regression risk for canonical-checkpoint loads (those go through `annotator_model.py`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In preparation for merging the training pipeline (currently in a separate repo) into this package, collapse to a single supported architecture. The legacy `CellTypeCLIPModel` path and its DCTConfig "legacy" profile were carrying ~1.8k lines of config blobs and dual-mode branching that would otherwise have to be ported into the training side as well. Removes: - `model.py` (CellTypeCLIPModel) and `loss.py` (CLIP/contrastive losses) - `dct_kit/utils.py` (all four helpers had no remaining callers) - 8 dead config blobs in `dct_kit/config/` — both deepseek-r1 and text-embedding-3-large JSON dumps, plus the legacy and (already-dead) `canonical_*.yaml` mirrors and the `tissue_celltype_mapping_merged` YAML Simplifies: - `predict.py`: drop `_is_canonical_checkpoint` routing, the legacy model/dataloader branches, and `_load_legacy_embeddings` - `dct_kit/config.py::DCTConfig`: remove the `profile=` kwarg, the legacy package-bundled init path, and the embedding-loader methods (`get_channel_embedding`, `get_celltype_embedding`) - `dataset.py::PatchDataset`: drop the `output_mode` parameter and the legacy `_combine_masks` / `_pad_images` / `_calcualte_marker_positivity` helpers — every batch is now canonical - `tests/test_canonical_inference.py`: drop the two legacy-arm tests; the remaining 6 unit tests still pass - `docs/index.md`: trim the legacy `master_channels.yaml` reference from the Limitations section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With the legacy CLIP model.py removed in the previous commit, the canonical CellTypeAnnotator can reclaim the obvious filename. Updates the two import sites (predict.py, tests/test_canonical_inference.py) to match. This also lines the import path up with the training repo (deepcelltypes-cell-type-assignment-pytorch), which has been using `deepcelltypes.model.CellTypeAnnotator` all along — easing the upcoming training-pipeline merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets up the structure for absorbing the training pipeline currently
maintained in the deepcelltypes-cell-type-assignment-pytorch repo, while
preserving the lean inference-only install that today's users rely on.
- New empty package ``deepcell_types.training`` with an explanatory
docstring; will be populated in subsequent phases (losses, dataset,
annotations, ...).
- pyproject extras:
- ``train`` — wandb / zarr (pinned >=3.1, <4 per the alpha
metadata-cache bug) / torchvision / torchinfo / torchmetrics /
pandas / scikit-learn / click / matplotlib
- ``baselines`` — xgboost / optuna
- ``analysis`` — plotly / seaborn / openpyxl / kaleido (pinned to
skip the broken 0.2.1.post1)
- ``all`` — fan-in convenience target
- Mirrored the [tool.pytest.ini_options] block from the training repo.
CI guard: tests/test_inference_deps.py imports the inference entry
points in a fresh subprocess and asserts that none of
{wandb, zarr, sklearn, pandas, torchvision, torchinfo, torchmetrics,
matplotlib} ends up in sys.modules. Future leaks from the training
side into the inference path will fail this test loudly. Subprocess
isolation prevents pytest's own imports from poisoning the check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copies three self-contained modules from the training repo (deepcelltypes-cell-type-assignment-pytorch) into ``deepcell_types/training/``: - ``losses.py``: FocalLoss (referenced from upstream pytorch-multi-class- focal-loss) and the dormant HierarchicalLoss (coarse-grained CT loss driven by a YAML fine→coarse mapping). HierarchicalLoss is kept ``weight=0`` in the canonical recipe but is part of the released training surface area for follow-on experiments. - ``annotations.py``: zarr-archive annotation extraction with KDTree centroid matching and the duplicate-label collapse / conflict-drop semantics the training pipeline depends on. Lazy-imports scipy and numcodecs so it stays cheap to import. - ``gold_metadata.py``: Pan-M Gold-Standard subset → (tissue, modality) canonicalization, including the non-direct mappings (decidua → uterus, Vectra/Opal → cycif) used at evaluation time. All three have zero cross-imports into the training-side ``config.py`` or ``utils.py``, so they land cleanly without waiting on Phase 6's config reconciliation. The remaining training surfaces with config dependencies — FullImageDataset, FOVGroupedSampler, augmentations, create_dataloader, and the training portion of utils.py — are deferred to Phase 6. The CI guard (tests/test_inference_deps.py) still passes: importing ``deepcell_types.predict`` does not transitively reach ``deepcell_types.training``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrates the canonical raw-FOV → archive preprocessing recipe from the
training repo (deepcelltypes-cell-type-assignment-pytorch:preprocessing.py)
and promotes it to the top-level public API.
The function is the single source of truth for transforming an ingested
raw FOV (``(C, H, W)`` intensity at a native MPP) into the format the
model consumes:
1. resample to ``TissueNetConfig.STANDARD_MPP_RESOLUTION`` (0.5 µm/px)
2. per-channel p99.9 clip (over nonzero pixels, matching the recovered
production recipe from
``hubmap-to-zarr@origin/deepcell-types:preprocess_for_training.py``)
3. per-channel min-max normalize to [0, 1]
4. cast mask to uint32 and compute centroids in resampled coordinates
Lives at the top level (``deepcell_types/preprocessing.py``), not under
``training/``, because public inference users need it too — running
``predict()`` against an arbitrary FOV requires this exact preprocessing
upstream. Re-exports the function from ``deepcell_types.__init__`` so
``from deepcell_types import preprocess_fov`` works.
Only numpy + skimage dependencies (both already in the base install) —
the inference-deps guard still passes.
The snapshot test from the training repo
(``tests/test_preprocessing.py::test_snapshot_against_production``)
will follow in Phase 9 when ``tests/`` is migrated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copies the training-side configuration module from B (deepcelltypes-cell-type-assignment-pytorch:config.py) into ``deepcell_types/training/config.py`` verbatim. B is the up-to-date source per the canonical-only-monorepo merge directive. The migrated surface (1343 lines) includes: - ``TissueNetConfig`` — heavy training-side config that opens the zarr archive directly (vs the inference-side ``DCTConfig`` which reads root attrs via JSON). Exposes ct2idx, marker2idx, domain2idx, tissue2idx, dataset_keys, tumor_datasets, tissue_celltype_mapping, domain_mapping, celltype_mapping, marker_positivity, plus the lazy per-dataset MP DataFrame loader. - ``LazyMarkerPositivityDict`` — dict-shaped lazy loader for per-dataset marker-positivity DataFrames; avoids walking ~1.9k datasets at init when only ~285 carry MP data. - ``archive_metadata_fingerprint`` / ``cached_archive_metadata_fingerprint`` / ``archive_array_fingerprint`` — stable hashes used to invalidate cell-data and baseline-feature caches after in-place archive repairs. - ``_discover_fov_keys`` — detects v7 (flat) vs v8 (5-level ``modality/tissue/cohort/sample/fov``) layouts and returns slash-joined leaf FOV keys that zarr and the filesystem both resolve. - ``_patch_zarr_v3_alpha_metadata`` — workaround for zarr 3.0.0a* metadata-cache bugs; the ``[train]`` extra pins zarr>=3.1 to avoid needing this in fresh installs, but the patch stays for dev envs still on the alpha. - ``extract_patch`` / ``extract_patch_from_zarr`` / ``compute_distance_transform`` — patch-extraction utilities consumed by training/dataset.py (next commit). Zero ``deepcelltypes.*`` cross-imports — the file is fully self contained at the package level, so the copy lands without rewiring. Inference-deps guard still passes: ``deepcell_types.predict`` does not transitively reach ``deepcell_types.training.config``, so zarr and pandas stay out of the base install. The DCTConfig (inference) vs TissueNetConfig (training) behavioral audit follows in a subsequent commit — the merge directive says B wins where they diverge, and a couple of spots need aligning (the domain2idx derivation, the ct2idx defensive casting). For now the two coexist and the inference path is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copies B's dataset.py (1706 lines) to ``deepcell_types/training/dataset.py``. Drop-in migration — B's relative imports (``from .config import ...``, ``from .annotations import ...``) resolve cleanly inside the new ``training/`` package since config.py and annotations.py already live there from the previous commits. Brings over: - ``FullImageDataset`` — the canonical zarr-backed training Dataset. Returns ``(C_max, 1, H, W)`` raw*self_mask, ``(3, H, W)`` spatial context (self_mask, neighbor_mask, distance_transform), and the "?" marker-positivity validity mask required by the marker positivity loss. - ``AugmentedDataset`` + ``DropOutChannels`` — train-time augmentations (horizontal/vertical flips, random channel dropout). - ``FOVGroupedSampler`` — keeps samples from the same FOV together within a batch to amortize zarr open cost / preserve neighborhood context. - ``create_fov_splits`` / ``save_fov_splits`` / ``load_fov_splits`` — stratified-by-modality FOV partitioning with sole-source detection so a single-FOV cell type stays in one split. - ``compute_sample_weights`` — class-balanced sampling weights. - ``create_dataloader`` — top-level factory the training scripts call. Inline ``_Compose`` / ``_RandomHorizontalFlip`` / ``_RandomVerticalFlip`` are intentional re-implementations of the torchvision transforms; B uses them to avoid a hard torchvision import at module-load time (the [train] extra pulls torchvision in, but the pattern matches the rest of the package's lazy-deps discipline). Inference-deps guard still green: importing deepcell_types.predict does not transitively reach deepcell_types.training.dataset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copies B's utils.py (1462 lines) to ``deepcell_types/training/utils.py`` verbatim and rewrites three lazy package imports from ``from deepcelltypes.X`` to ``from deepcell_types.training.X``. Surface migrated: - ``BatchData`` — dataclass collecting per-batch inputs (sample, spatial_context, ch_idx, padding_mask, marker_pos_mask, ct_label, domain_label, dataset_name, fov_name, cell_index, ...). - ``LossesAndMetrics`` — per-epoch loss and metric accumulator. - ``MPMetricsTracker`` — marker-positivity per-marker counters and threshold sweeps. - ``PredLogger`` — atomic-write CSV predictions logger (5-field: labels, probs, cell_index, dataset_name, fov_name). Name collides with ``deepcell_types.predict.PredLogger`` but they live in different namespaces — A's is a 2-field inference result buffer with a different interface. Leaving both: B is the training-authoritative version, A's stays as the inference API contract. - ``get_tissue_ct_exclude`` — per-sample tissue/dataset-aware ct exclusion list builder for training-time masking. Different function from A's ``_excluded_celltype_indices`` (which is the per-tissue public-API affordance for ``predict(tissue_exclude=...)``); both retained. - Seed / dataloader hygiene: ``seed_everything``, ``worker_init_fn``, ``make_generator``. - Label compaction: ``build_label_remap``, ``adjust_conf_mat_hierarchy``. - Wandb logging: ``log_epoch_metrics``, ``log_confusion_matrix`` (wandb is a lazy import inside the functions; module load works without it). - Feature extraction: ``extract_features_from_zarr``, ``_extract_all_dataset_features``, ``compute_baseline_metrics``, ``save_baseline_predictions``. - Atomic file utilities and a cache-metadata fingerprint helper used by the cell-data caching layer. Inference-deps guard still green: importing ``deepcell_types.predict`` does not transitively reach ``deepcell_types.training.utils`` (pandas, the heaviest dep here, stays out of the base install). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copies B's abstention.py into ``deepcell_types/training/abstention.py``. Placement decision: training-side, not top-level public API. The ``apply_abstention`` function takes a ``pandas.DataFrame`` as its input, so promoting the module to ``deepcell_types/abstention.py`` would either force pandas into the inference base install (breaks Phase 3's [train] dep split) or require a pandas-to-numpy refactor of the public surface. Neither is justified now — the module's existing callers in B all live in scripts and notebooks that will be moved under training-side surfaces in Phase 9. If the public release wants abstention as a first-class inference feature later, two options: (a) refactor apply_abstention to take arrays + group keys instead of a DataFrame and move it to deepcell_types/abstention.py; (b) accept pandas in the inference deps and move both the file and the [train] guard. Defer to Phase 10. Updates one internal docstring reference from ``deepcelltypes.utils.adjust_conf_mat_hierarchy`` to the new ``deepcell_types.training.utils.adjust_conf_mat_hierarchy`` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Intended to land in the previous commit but the Edit was rejected because the file hadn't been Read in-session yet. Updates the ``hierarchical_correct`` docstring's cross-reference from ``deepcelltypes.utils.adjust_conf_mat_hierarchy`` to the migrated ``deepcell_types.training.utils.adjust_conf_mat_hierarchy`` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final bulk migration from
deepcelltypes-cell-type-assignment-pytorch (B) into the canonical-only
monorepo:
- ``scripts/`` (16 entry points, 528KB): train.py, pretrain.py,
predict.py (script form, distinct from the library
``deepcell_types.predict``), benchmark_gold_standard.py,
run_gold_standard_nimbus.py, generate_openai_embeddings{,_v8}.py,
generate_manifest_index.py, generate_splits.py, ingest_gold_to_zarr.py,
split_val_for_test.py, refine_mp_labels_with_intensity_v2.py,
validate_archive_contract.py, _combine_v3_and_C.py,
download_gold_standard.sh.
- ``tests/`` (22 new test modules merged with A's existing
``test_canonical_inference.py`` and ``test_inference_deps.py``).
No filename collisions; the A tests stayed in place.
- ``config/combined_celltypes.yaml`` — small (~couple KB) cell-type
group taxonomy used by TissueNetConfig.combined_celltype_mapping.
Skipped the 30MB ``marker_embeddings-deepseek-r1-70b.json`` (training
artifact, not part of the public release surface — users regenerate
via scripts/generate_openai_embeddings.py).
Import rewrites applied via sed across the migrated files AND across
``deepcell_types/training/`` itself (caught two lazy ``from
deepcelltypes.utils import ...`` imports inside training/dataset.py
that the verbatim copy preserved):
deepcelltypes.model -> deepcell_types.model (top-level)
deepcelltypes.preprocessing -> deepcell_types.preprocessing (top-level)
deepcelltypes.abstention -> deepcell_types.training.abstention
deepcelltypes.annotations -> deepcell_types.training.annotations
deepcelltypes.config -> deepcell_types.training.config
deepcelltypes.dataset -> deepcell_types.training.dataset
deepcelltypes.losses -> deepcell_types.training.losses
deepcelltypes.utils -> deepcell_types.training.utils
deepcelltypes.gold_metadata -> deepcell_types.training.gold_metadata
from deepcelltypes import -> from deepcell_types.training import
Path fix in training/config.py: ``CONFIG_DIR`` now resolves three
parents up (deepcell_types/training/config.py ->
deepcell_types/training/ -> deepcell_types/ -> repo root -> config/),
one ``.parent`` deeper than B's original two-segment ``deepcelltypes/``
layout.
Test results: 239/245 passing. The 6 failures are all env/data
dependent, not migration bugs:
- 4 × test_v2.py::TestLossesAndMetricsCompute — needs torchmetrics
(in the [train] extra, not in the base install). Pass when [train]
is installed.
- 1 × test_preprocessing.py::test_snapshot_against_production — needs
the production zarr archive at PRODUCTION_ARCHIVE AND zarr>=3.1
(the [train] pin); dev env has zarr 3.0.0a5 and no archive.
- 1 × test_refine_mp_labels_v2.py::test_stage7_synthetic_gold_validation
— imports from ``analysis/`` which was explicitly deferred from
this migration (research cruft triage is a separate exercise).
Deferred from this migration: ``output/`` (62GB), ``models/`` (48GB),
``features/`` (8.9GB), ``baselines/`` (7.3GB), ``data/`` (4.9GB),
``wandb_tmp/``, ``embeddings/``, ``figures/``, ``splits/``, ``logs/``,
``analysis/`` (~400KB), ``experiments/`` (~400KB). The big ones are
training artifacts that should never be in git regardless; the small
ones (analysis/, experiments/) are research code to triage separately
before deciding whether they belong in the public release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the four baseline comparison repos into A as submodules, matching B's layout under ``baselines/``. Each sub-repo is owned at github.com/xuefei-wang/deepcelltypes-<name>.git and is now tracked on ``main``. Pre-flight: in each baseline repo, ``paper-faithfulness-alignment`` was 2-4 commits ahead of ``main`` with 0 behind. Fast-forwarded ``main`` to ``paper-faithfulness-alignment`` and pushed for each repo before adding the submodule here, so A's pin lands on the same commit that B's pin pointed to: cellsighter 79c79aa..e8c078d (paper -> main FF) maps c50c0eb..5b59f46 nimbus f3f65e9..9bfe11d xgboost e4db5ed..b227380 A's submodule branch tracking points at ``main`` for all four; ``paper-faithfulness-alignment`` remains as a historical reference in each sub-repo but is no longer the active branch. The baseline source code is small (~200KB total tracked); B's local ``baselines/maps`` working tree had ~7GB of model artifacts that are not git-tracked and stay in B. The fresh clone in A contains only the tracked source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the two BLOCKER findings from the deep-review:
1. ``tifffile`` is a top-level import in ``scripts/ingest_gold_to_zarr.py``
(and lazy at 4 other scripts) but was absent from ``pyproject.toml``.
Any clean ``pip install deepcell-types[train]`` could not run the
ingest pipeline. Added to the ``[train]`` extra.
2. ``training/config.py::CONFIG_DIR`` resolved three ``.parent``s up
to a repo-root ``config/`` directory that does not exist after
``pip install`` (it would land at ``site-packages/config/``). The
YAML file ``combined_celltypes.yaml`` therefore was unreachable
from any non-editable install, and ``combined_celltype_mapping``
silently returned ``{}`` — group-level cell-type logic invisibly
broke for installed users.
Fix: move ``config/combined_celltypes.yaml`` into the package at
``deepcell_types/training/config/combined_celltypes.yaml``, shorten
``CONFIG_DIR`` to ``Path(__file__).parent / "config"``, and extend
``[tool.setuptools.package-data]`` to include
``training/config/*.yaml`` so the wheel actually ships the file.
Verified: ``yaml.safe_load(CONFIG_DIR / "combined_celltypes.yaml")``
loads 48 entries from the new location.
Test suite after fix: 115 passed, 1 skipped, 1 fail. The single
failure is ``test_snapshot_against_production`` which needs zarr>=3.1
(``[train]`` extra pins it) plus the production archive available
at ``$PRODUCTION_ARCHIVE_PATH`` — pre-existing env-dependent skip,
not introduced by this commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Themes addressed in one batch (see reviews/2026-05-10-2345/SYNTHESIS.md):
- J (errors H1): mp_macro_precision/recall used np.mean over arrays
containing np.nan for vacuous markers — poisoning wandb dashboards
with NaN every epoch. Switch to np.nanmean with an all-NaN guard,
matching the existing macro_f1 treatment.
- A (API/simplification H3): rename predict.PredLogger to
_InferenceResultBuffer (private) to remove the collision with the
richer training-side PredLogger; same-name, incompatible-signature
classes were a future-bug magnet.
- B (API/perf H2): num_workers default 24 → 0 in predict(). The doc
string already warned "only safe with >64 GB RAM"; 24 workers each
hold a full FOV in-memory and re-run preprocessing.
- C (multiple): drop stale deepcelltypes-kit fallback paths in
get_channel_embedding / get_celltype_embedding (path didn't exist
post-merge → silent {} return); rewrite the training/config.py
module docstring; fix docs/site/API-key.md broken
"from utils import download_training_data" import.
- D (API H1, M4): TissueNetConfig default zarr_path is now None with
DEEPCELL_TYPES_ZARR_PATH env-var fallback (was hard-coded /data2/...
NFS path). Fix create_model docstring to name DCTConfig.
- I (errors H2/H3, M1, M5): narrow three broad `except Exception`
blocks in dataset.py (_load_tissuenet_archive: cache build,
modality attr, tissue attr) to (KeyError, AttributeError, TypeError,
ValueError, OSError, json.JSONDecodeError, GroupNotFoundError).
Add a >1% drop-rate guard so schema regressions can no longer
silently lose hundreds of datasets. Narrow zarr-v3-alpha shim
except to ImportError. Catch UnicodeDecodeError in
_read_dataset_metadata.
Also removed two vestigial "see MEMORY.md" cross-references in
LossesAndMetrics warning text (MEMORY.md never existed in this repo).
Tests: 243 passed, 1 skipped, 1 pre-existing env-dependent failure
(test_stage7_synthetic_gold_validation needs analysis/ on path).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
From the deep review's tests.md HIGH findings + the API/tests M1
cross-config agreement gap:
- _build_model n_markers mismatch → ValueError
- _build_model n_celltypes mismatch → ValueError
- _excluded_celltype_indices on unknown tissue → ValueError
- _excluded_celltype_indices positive case: returned exclusion rows
contain every non-allowed index for the tissue
- _excluded_celltype_indices(tissue=None) passthrough
- PatchDataset with channel_names matching nothing → ValueError
("No input channels matched")
- DCTConfig and TissueNetConfig built from the same archive must
agree on MAX_NUM_CHANNELS, CROP_SIZE, STANDARD_MPP_RESOLUTION,
marker2idx, ct2idx (importorskip("zarr") gates the test on the
training extra).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
From reviews/2026-05-10-2345/simplification.md H1+H2 and complexity.md H2: - Delete _zarr_group_filesystem_path and _read_v3_1d_array from training/utils.py. Both were verbatim copies of annotations.py's group_filesystem_path / read_v3_1d_array with zero callers across the repo (verified by grep). The annotations.py versions are the canonical ones imported by training/dataset.py. - Delete the three pass-through static shim methods on FullImageDataset (_group_filesystem_path, _read_v3_1d_array, _centroid_to_cell_idx_fast). None were called anywhere — adding zero value, only obscuring that the real helpers live in annotations.py. Note: _build_centroid_tree is kept (also flagged but not in the HIGH list). - Backport the zstd-level-aware codec read from dct_kit/config.py into annotations.py:read_v3_1d_array. The old training-side copy hardcoded Zstd(level=0) while the inference side correctly reads level from the codec config. With archives written at a non-zero compression level the training-side read would silently produce garbage. Both paths now share the level-aware contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es (Theme F) config.py and utils.py had grown to 1.3k and 1.5k LOC, mixing archive fingerprinting, patch extraction, metric trackers, baseline IO, and the core TissueNetConfig/RNG/log helpers in one place each. Carve four focused modules out (verbatim, no logic changes): - training/archive.py: zarr v3 alpha metadata patch, archive metadata / array fingerprinting, FOV-key discovery, and the per-process caches. - training/patch.py: per-cell patch extraction (compute_distance_transform, extract_patch_from_zarr, extract_patch). - training/metrics.py: confusion-matrix hierarchy adjustment, MP per-marker reduction, MPMetricsTracker, LossesAndMetrics, build_label_remap. - training/baseline_features.py: baseline classifier feature extraction pipeline (_conf_mat_summary, compute_baseline_metrics, save_baseline_predictions, _extract_all_dataset_features, extract_features_from_zarr, _get_cell_data_from_ds). Re-exports at the bottom of config.py and utils.py keep all tests/scripts working unchanged (230 passed, 1 skipped, matching the pre-split baseline). dataset.py is updated to import directly from the new homes for cached_archive_metadata_fingerprint and extract_patch. Two non-mechanical touches required to keep monkey-patch-based tests green: - baseline_features.extract_features_from_zarr looks up _discover_fov_keys and _extract_all_dataset_features via the config / utils modules at call time, so tests that monkeypatch those symbols on the legacy modules still take effect after the split. _FINGERPRINT_CACHE / _FOV_KEYS_CACHE dicts are re-exported from config.py for the same reason (test_dataset_cache mutates them). - metrics.LossesAndMetrics.compute defers import of _conf_mat_summary to method-call time to avoid a metrics <-> baseline_features import cycle (baseline_features needs adjust_conf_mat_hierarchy at module load). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
From reviews/2026-05-10-2345/docs.md HIGH findings:
- README: add a "Training" section describing the [train] extra and the
four main entry points under scripts/. Move "Download the model"
after "Installation" (was non-executable in reading order).
- docs/index.md: add a "Training" section explaining that training-only
code lives under deepcell_types.training, gated behind the [train]
extra, with pointers to scripts/{train,predict,pretrain,
benchmark_gold_standard,ingest_gold_to_zarr}.py. Fix the long-standing
"sorce" typo.
- docs/site/tutorial.md: bump the example archive placeholder from
tissuenet-v8.zarr → tissuenet-v9.zarr to match DCTConfig's probe
order (v9 is the canonical contemporary archive).
The docs.md HIGH for the broken `from utils import download_training_data`
import in docs/site/API-key.md was fixed in 88b95f9.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five MEDIUM/HIGH findings from reviews/2026-05-10-2345 in one batch: - complexity H1: TissueNetConfig.get_marker_positivity() and marker_positivity_labels[] now share a single LazyMarkerPositivityDict. Previously the plain-dict cache populated by get_marker_positivity() was discarded the first time marker_positivity_labels was accessed (the property replaced the field), causing wasted I/O and divergent caches. _marker_positivity_cache is now Optional[LazyMP...] and lazily constructed on first access; get_marker_positivity routes through marker_positivity_labels for a single source of truth. - numerical M1: MarkerEmbeddingLayer.forward zeros output for padding positions (ch_idx == -1). Without this, F.normalize(proj(0)) yielded a unit-norm direction equal to F.normalize(proj.bias) — a non-trivial embedding flowing into the transformer for tokens that should be invisible. - numerical M2: CellTypeAnnotator.forward zeros spatial features for padding positions BEFORE the fusion concat. Otherwise padding tokens enter self.fusion with [0, spatial_feat] and emerge as W_spatial @ spatial_feat + bias. - API M1: rename predict(tissue_exclude=...) → predict(tissue_filter=...). The old name was inverted — "tissue_exclude='colon'" actually meant "filter TO colon-associated cell types". The deprecated alias stays (keyword-only) and emits DeprecationWarning; passing both raises TypeError. - API M3: predict(return_probabilities=True) returns a PredictionResult dataclass with cell_types, probabilities (full per- cell softmax matrix), and cell_indices. Default behaviour unchanged (returns list[str]). PredictionResult and DCTConfig are now hoisted to top-level so `from deepcell_types import PredictionResult, DCTConfig` works. Tests: 233 passed, 1 skipped. Added 3 new tests covering return_probabilities, tissue_exclude DeprecationWarning, and the both-args TypeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tests M3: add a regression anchor in test_train_loop_smoke.py that asserts scripts/train.py still contains the AMP scheduler-gate predicate. The 2-line _run_gated_step helper is faithful to the production behavior but a silent drift would otherwise let the emulator tests pass while real training desynchronizes OneCycleLR. - tests M2: same idea for test_zero_channel_masking.py. The unit-test helper is a verbatim copy of __getitem__'s masking block; a refactor could let the copy drift. New test asserts training/dataset.py still contains _zero_channel_cache and fov_zero_mask. - docs M4: add CHANGELOG.md documenting the 0.0.1 → 0.1.0 release (canonical-only refactor, training subpackage, breaking removal of CellTypeCLIPModel, deprecated tissue_exclude alias, num_workers=0 default, TissueNetConfig env-var default). Bump version in pyproject.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
complexity H8: replace FullImageDataset.indices' positional 8-tuple
with a CellIndexRecord NamedTuple. Named fields make grep / refactor
safe (no more record[6] / record[5] magic numbers across 10+ call
sites). NamedTuple IS a tuple, so positional access still works for
backward compat with serialized caches that stored raw 8-tuples.
Production call sites in dataset.py now use .ct_label_standard,
.dataset_name, .fov_name, .ds_idx, .domain accessors. Mock-index
constructors in tests/{test_v2,test_samplers,test_stratified_splits,
test_dataset_splits}.py updated to build CellIndexRecord instances.
complexity H7: introduce DataLoaderConfig dataclass + matching
create_dataloader_from_config(zarr_dir, dct_config, cfg) wrapper.
Lets new callers pass a single discoverable object instead of 20+
keyword arguments. The legacy keyword signature of create_dataloader
is preserved verbatim so train.py / predict.py / tests don't need
any change. Field defaults mirror create_dataloader's defaults
exactly — DataLoaderConfig() is equivalent to no-override.
Tests: 235 passed, 1 skipped (analysis-only env failure unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse training pipeline into deepcell-types (canonical-only)
…ne submodule rebase
Three independent bugs surfaced when running training against the current
master HEAD from a fresh workspace install:
1. tissue_idx kwarg mismatch (scripts/train.py:121, scripts/predict.py:208 + 334)
scripts pass `tissue_idx=batch_data.tissue_idx` to
`CellTypeAnnotator.forward(...)`, but the model's forward signature is
`(sample, spatial_context, ch_idx, padding_mask, ct_exclude=None,
return_attn_weights=False, domain_idx=None)` — no `tissue_idx`. The
tissue-FiLM MP head experiment was rolled back (see memory
`v10_mp_expansion_tissue_negative.md`) and the model dropped the
parameter, but the scripts kept passing it. Result: every training /
prediction run dies at the first forward pass with
`TypeError: ...got an unexpected keyword argument 'tissue_idx'`.
Fix: drop the kwarg at all three call sites. `batch_data.tissue_idx`
is still populated by the dataloader and remains available to anyone
who needs it downstream — the model just doesn't consume it.
2. Circular import between training/utils.py and training/baseline_features.py
utils.py re-exports four symbols from baseline_features.py at module
level for backward compat. baseline_features.py also imports private
helpers (`_atomic_np_savez` etc.) from utils.py. When utils.py is
imported first (training path) the cycle resolves fine, but when
baseline_features.py is imported first (baseline path — e.g.
`import xgb.run`), the partially-initialized utils.py reaches back to
`baseline_features._extract_all_dataset_features` before that name is
defined, and ImportError fires.
Fix: convert the re-exports to a module-level `__getattr__` so the
lookup is deferred until actual access, by which point both modules
have finished initializing. Existing callers
(`from deepcell_types.training.utils import save_baseline_predictions`,
verified in tests/test_v2.py) keep working.
3. Submodule rebase (baselines/{maps,cellsighter,xgboost,nimbus})
Each baseline's pyproject.toml listed `deepcelltypes @ git+...
deepcelltypes-cell-type-assignment-pytorch.git` as a dep; that URL
now resolves to the renamed research workspace (no longer a Python
package) and `uv pip install` fails with a metadata-name mismatch.
Each baseline also imported from `deepcelltypes.{config,utils,dataset}`
— the pre-refactor flat layout. Companion commits on each submodule's
`fix/post-refactor-imports` branch replace the dep URL with a plain
`deepcell-types` and rebase imports onto
`deepcell_types.training.{config,utils,dataset,metrics,baseline_features}`.
This parent commit bumps the submodule pointers to those branch tips.
End-to-end verification: with the three fixes, a fresh workspace `uv sync`
+ smoke training (`scripts/train.py` with the v10 split + svd_512_v6
embeddings) gets through model build, GPU allocation, and reaches batch 0
of epoch 0. The xgboost baseline imports cleanly after
`uv pip install -e baselines/xgboost`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…port fix(train,predict,utils): tissue_idx kwarg + circular import + baseline submodule rebase
uv.lock is regenerated on every branch switch when pyproject.toml shapes differ (master vs training), so keeping it tracked produces constant churn. /reviews/ holds local /deep-review outputs.
Untouched since Oct 2024 and broken since the kit was inlined in 0a8108e: it COPYs a non-existent top-level requirements.txt and pip-installs the deleted deepcelltypes-kit/ directory. No CI, docs, or scripts reference it.
…mmit -> CSV) Pins each reported test number to its checkpoint (sha256), the train/eval code commit (d13fd54 for the MLP-head run, b598710 for the resMLP run; both ancestors of PR #41), and the prediction CSV (sha256). Notes the self-pinning gap (configs don't record git_commit) addressed separately on PR #41. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CKPT_CONFIG now stores the code commit the run executed under (git rev-parse HEAD of the checkout owning train.py — the pinned worktree's HEAD when run via a pin), so any checkpoint/result traces to an exact code snapshot. 'unknown' when not a git repo. Closes the traceability gap noted in the recipe-ablation manifest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty gap note Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e reported set MAPS and CellSighter chose their best checkpoint by evaluating on the same set they then reported (MAPS: data["X_val"] each epoch -> best-by-val-loss; CellSighter: test_loader each epoch -> best-by-macro-accuracy). Selecting on the reported set is leakage. XGBoost was already correct (FOV-grouped inner-val for early stopping). Both baselines now mirror XGBoost: select on a FOV-grouped inner-validation set carved from the TRAIN FOVs (10%), and report once on the untouched test set. - dataloader.create_dataloader: additive, default-off `inner_val_ratio`/`inner_val_seed`. When >0, carves a FOV-grouped inner-val from train_indices, trains only on inner-train, and returns the inner-val loader via metadata["inner_val_loader"]. Default 0.0 leaves the main-model path unchanged. - maps/run.py: GroupShuffleSplit(test_size=0.1) on train FOVs; normalization stats, sampler, and per-epoch val-loss selection all from inner-train/inner-val. - cellsighter/run.py: inner_val_ratio=0.1; selection on metadata["inner_val_loader"]. - READMEs: record the deviation from upstream selection protocol. - test_maps_cellsighter_equivalence: drop the run.py byte-equivalence pin (logic now intentionally deviates from upstream) and replace with a behavioral inner-val check. Consequence: changes published MAPS/CellSighter numbers (now train on ~90% of train cells); requires a full re-run on the v10 archive to regenerate the headline table. Does not touch the abstention asymmetry (scoped out). CellSighter still selects by macro-accuracy (separate finding, left unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CellSighter selected its best checkpoint by macro-accuracy while the main model (scripts/train.py -> val_macro_f1) and the headline comparison use macro-F1. When accuracy and F1 diverge (systematic in imbalanced multi-class settings) the returned checkpoint was not the macro-F1-optimal one, depressing the reported CellSighter macro-F1. Switch selection (on the held-out inner-val) to macro-F1; update the saved checkpoint key and logging. Reported test metrics unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…isjointness Extract the inner-validation carve from create_dataloader into a module-level _carve_inner_val_fovs helper (behavior-preserving; the default no-op path returns train_indices unchanged) so the leakage-critical FOV-grouping is unit-testable. Add regression tests asserting whole-FOV grouping, train/inner-val disjointness, a clean index partition, the no-op path, and the >=1-inner-train-FOV cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t_strict knobs Thread optional per-loader knobs through extract_patch, FullImageDataset, and create_dataloader, each defaulting to current behavior so DCT/MAPS inputs stay byte-identical: - mask_intensities (default True): when False, return the full crop including neighbor intensities instead of raw*self_mask (single-cell input). - crop_size/output_size (default dct_config): per-loader patch-size override. - train_transform (default H/V flips): custom train-time spatial augmentation. - split_strict (default True): downgrade split fingerprint mismatch to a warning when all split FOVs are present in the current archive. Enables the faithful CellSighter baseline without altering the shared single-cell path. Full suite: 313 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reimplement the CellSighter baseline to follow Amitay et al. (Nat Commun 2023) training recipe inside our cross-tissue harness: - ImageNet ResNet50 stem (7x7/s2 + maxpool) for 60x60 crops; --cifar_stem keeps the 32x32 CIFAR stem for ablation. - Unmasked neighbor intensities (mask_intensities=False); --mask_self ablation restores the single-cell input. - Geometric augmentation module (rotation, vectorized per-channel shift, mask dilation, flips@0.75). Poisson resampling omitted: preprocessed/raw is [0,1] min-max normalized, not photon counts, so Poisson would corrupt the signal. - New flags: --crop_size (default 60), --seed (ensemble diversity), --test_split_file (final eval on a held-out split), --allow_split_mismatch. Re-pin the cellsighter freeze snapshots as a drift guard (no longer an upstream-identical port) and re-freeze its CLI option set. maps stays frozen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n docs b598710 (resMLP-head default) and ef1229f (checkpoint git_commit self-pinning) are independent xuefei/master commits, not part of PR #41. Only d13fd54 is in PR #41's lineage. Correct the three attribution claims in TRACEABILITY.md and REPORT.md; numerical claims unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an explicit cross-file FOV disjointness check between --split_file
('train') and --test_split_file ('val'): load_fov_splits only checks
overlap within a single file, so a mismatched pair could silently leak
training FOVs into the reported number. Also warn loudly when
--test_split_file is omitted, since the final eval then reuses the
checkpoint-selection val loader (selection-on-the-eval-set, not a
held-out number). Re-pin the cellsighter run.py drift-guard SHA.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tched-budget uniform sampler
…les_per_epoch flags
… + full-inv-freq)
The CellSighter baseline silently inherited DCT's sqrt-inverse-frequency
WeightedRandomSampler (1000-count floor) via create_dataloader's default,
not the original CellSighter's equal-proportion balancing (research-workspace
issue #96). Make the baseline genuinely faithful on the class-balancing axis.
Faithfully reproduce KerenLab/CellSighter's recipe:
- subsample_indices_per_class: caps the TRAIN pool to <=size_data cells/class
(subsample_const_size; paper size_data=1000), deterministic per seed, val/test
untouched.
- compute_sample_weights_equal: full-inverse-frequency weights weight=total/count
(define_sampler with sample_batch=true).
Wire a `class_balance` {equal|sqrt|none} + `size_data` knob through
create_dataloader; the CellSighter baseline now defaults to the faithful
equal-proportion scheme (--class_balance equal --size_data 1000), with sqrt and
none as ablations. --no_weighted_sampler kept as a deprecated alias for
--class_balance none. Legacy use_weighted_sampler still honored when
class_balance is None (main DCT model and other callers unaffected).
Docs: README documents the now-faithful default + remaining hierarchy_match
deviation. Tests: 6 new unit tests for the weight law + size_data cap; updated
the cellsighter option-freeze set.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A three-round faithfulness audit of the baselines against their original papers/codebases found the baseline READMEs accurately documented architecture and training mechanics but omitted several data-pipeline deviations. This adds the verified, source-cited disclosures: - CellSighter: neighbor-intensity self-mask (training/patch.py:176) vs upstream's raw neighbor intensities; [0,1] min-max normalization vs upstream raw counts; flips-only augmentation vs upstream's seven; 32x32 context vs 60px; sqrt- vs full-inverse-frequency sampler. Most are shared DeepCell Types preprocessing (fairness-neutral across models) but still deviate from how upstream CellSighter was trained. Self-mask impact was empirically tested (feat/faithful-cellsighter): the ranking did not change. - Nimbus: prediction resize uses INTER_LINEAR vs upstream 0.0.5 INTER_NEAREST; mpp-based rescale vs magnification-ratio. Verified against the installed nimbus-inference==0.0.5 wheel; core primitives (sigmoid, prepare_binary_mask, cross-FOV normalization) confirmed faithful. - XGBoost: no cellSize feature and no class balancing (conservative vs the neural baselines); tuning budget not matched (Optuna only for XGBoost). Documentation only; no code or behavior changes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Remove pre-existing F401 (pandas, typing.Dict/Any, torch.nn.functional) and F541 (placeholder-less f-string) in run.py, surfaced by ruff --fix during the baseline integration. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # README.md # tests/test_preprocess_hook.py
# Conflicts: # README.md
…sole balancer The DCT backbone training now uses the WeightedRandomSampler (compute_sample_weights in dataset.py) as the SOLE rare-class balancer. The redundant per-class FocalLoss alpha weighting is removed entirely (cleaner than an interlock), making double-weighting structurally impossible. The focal term (gamma, via --focal_gamma) is kept unchanged. Concretely: - Delete the compute_class_weights() helper in scripts/train.py (its only caller; MAPS keeps its own separate compute_class_weights in baselines/maps/run.py, untouched). - Delete the call site plus the plumbing that fed it only (the dataset-layout isinstance checks + train_dataset_ref/train_indices unwrapping, and the now -unused AugmentedDataset/FullImageDataset imports). label_remap is retained; it is used elsewhere. - FocalLoss alpha is now hard-coded to None for backbone training. - Remove the --no_class_weights Click flag, its main() parameter, and its "no_class_weights" key from the checkpoint config dict. Removing the config key is safe: extra/missing config keys do not break checkpoint loading. - Update the LossesAndMetrics double-weighting warning in metrics.py to drop the stale --no_class_weights reference. - CHANGELOG: note the schema change and the default change. This makes the no-flags default reproduce the released-checkpoint recipe, which was trained with --no_class_weights. It CHANGES scripts/train.py's no-flag default versus v0.1.0 development builds (which applied class weights by default); the released checkpoint and the stage-2 head retrain (scripts/retrain_head.py, plain CrossEntropyLoss) are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add model version 2026-06-15 (deepcell-types_2026-06-15_resmlp.pt, md5 704616a1...) and set it as _latest, so download_model() / predict.py default to the two-stage residual-MLP head-retrain model (80.27 hier macro-F1 on the held-out 129-FOV test split, vs tuned XGBoost 79.03 and the prior Frozen-CLS 74.20). The resMLP head is auto-detected by predict._build_model via ct_head.inp.0.weight, so no caller change is needed. The prior Frozen-CLS release (2026-05-17) is retained in the registry for reproducibility. NOTE: the asset deepcell-types_2026-06-15_resmlp.pt must be uploaded to users.deepcell.org/models/ before this pin resolves for end users. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…add headline number The pr-31 archive-free-README merge took its README wholesale (--theirs), which reverted master's richer Training section (retrain_head.py stage-2 recipe, evaluate_on_test.sh, the leakage-free-test-split headline sentence) because pr-31 branched before that landed. Restore it as a union (pr-31's archive-free inference sections + master's Training section) and state the headline number: two-stage resMLP 80.27 hierarchical macro-F1 vs tuned XGBoost 79.03. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Promote two-stage resMLP to the headline released model
Recipe-ablation review docs with corrected commit attributions (was #44)
# Conflicts: # deepcell_types/training/dataloader.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Merges the separate training repository (
deepcelltypes-cell-type-assignment-pytorch)into this repo and replaces the legacy
CellTypeCLIPModelinference path with thecurrent canonical model. This is the
v0.1.0release cut.Before this PR,
vanvalenlab/deepcell-typeswas inference-only — it shippedCellTypeCLIPModel, thedct_kit/helpers, and a top-level__init__thatexported just
predict. After it, a single package covers training andinference: inference stays a plain
pip install deepcell-types, the fulltraining pipeline lives behind a
[train]extra, and the four paper comparisonbaselines are vendored behind per-baseline extras.
Canonical model
model.pyis rewritten aroundCellTypeAnnotator;CellTypeCLIPModel/CellTypeDataEncoderare removed. Canonical training defaults (scripts/train.py,click-based CLI):--resnet_channels 48,--domain_weight 0.1,--best_metric macro_f1.into a marker-position vector and injected as a CLS residual. The output
projection is zero-init, so warm-starting from a checkpoint preserves
predictions at step 0.
(
--domain_weight 0.1;0disables it).--freeze_backbonetrains only themean-intensity branches on top of an existing checkpoint;
--unfreeze_ct_headadditionally co-adapts the CT head / CLS token / final norm without unfreezing
the transformer backbone.
masked_fill) through thechannel encoder, fusion, and mean-intensity paths so masked tokens contribute
exactly zero rather than leaking
bias/spatial_featinto the transformer.scripts/train.pybundlesct2idx,n_heads,and
compat_marker0_zerointo the checkpoint, and inference asserts thevocabulary ordering matches (a permuted vocabulary previously passed the
count-only check and silently mislabeled cells).
Canonical-only inference
packaged
vocab.jsonsnapshot, sopip install deepcell-types+download_model()is enough to runpredict()— the multi-GB TissueNet zarrarchive is no longer required (pass
zarr_path=/ setDEEPCELL_TYPES_ZARR_PATHonly if you need it). Verified identicalpredictions with vs. without the archive on the paper checkpoint.
ct_abstention_k=0.2), bucketedper-FOV everywhere (CLI, Python API, library): cells below an IQR fence on
the FOV confidence distribution are relabeled to the
"Unknown"sentinel(skipped when
kis disabled or the FOV has <4 cells).predict(..., preprocess=...)overrides theper-FOV normalization without retraining, backed by a bounded op library
(
apply_config,make_preprocessor,DEFAULT_CONFIG) and acomposition-guided adaptation loop (
skills/preproc-adapt/).DCTConfig.PERCENTILE_THRESHOLD) is now99.9, matching the recipe the training archive was built with (was99.0,a carryover from the original packaging).
predict(return_probabilities=True)returns aPredictionResultdataclasswith the full per-cell softmax matrix, cell indices, and the pre-abstention
argmax labels (
cell_types_raw)._torch_load_weightsloads withweights_only=Trueand emits a loud warningif it has to fall back to unsafe pickle on an older torch; a missing
checkpoint raises a clear
FileNotFoundErrorpointing atdownload_model().New public API
predict,DCTConfig,PredictionResult,preprocess_fov,apply_config,make_preprocessor, andDEFAULT_CONFIGare importable fromdeepcell_typesdirectly.
preprocess_fov(raw, mask, native_mpp, channel_names) → PreprocessedFovis the standalone preprocessing entry point.Monorepo: training pipeline
deepcell_types.trainingships from this repo behindpip install "deepcell-types[train]":config.py,dataset.py,archive.py,annotations.py,baseline_features.py,gold_metadata.py,losses.py,metrics.py,patch.py,utils.py,abstention.py.scripts/:train.py,pretrain.py,predict.py,generate_openai_embeddings.py,generate_splits.py,split_val_for_test.py,plus the release-archive gate (
validate_archive_contract.py,check_release_archive.sh).splits/(
fov_split{,_valsubset,_test}.json+ README), so the publishedtrain/val/test partition is reproducible from the repo.
anywhere (
--enable_wandbis gone; confusion matrices save locally as PNGs).zarr>=3.1pulls the Python floor up to 3.11 for the train extra.Baselines
deepcell_types/baselines/(
cellsighter,maps,nimbus,xgb), invoked through the unified runnerpython -m deepcell_types.baselines <name>, each with a self-containedinstall extra (
baseline-cellsighter,baseline-maps,baseline-nimbus,baseline-xgboost).source; third-party licenses are tracked in
deepcell_types/baselines/NOTICE.extract_features_from_zarr(missing_value=...)lets each baseline choose itsabsent-marker sentinel: MAPS / CellSighter keep
0.0; XGBoost can passnp.nanso absent markers route through XGBoost's learnedmissingdirectioninstead of being conflated with "present, intensity 0.0". The feature matrix
records a
present_markersmask and the cache stays missing-value-agnostic.Breaking changes
CellTypeCLIPModelremoved. No shim — usefrom deepcell_types import predict, DCTConfig.predict()arguments aftermppare keyword-only, preventingaccidental transposition of the adjacent string arguments.
device=is thepreferred spelling (
device_num=remains a deprecated alias).predict(num_workers=...)default is now0(was24) — 24 workersOOM'd machines with <64 GB RAM.
of prior releases; pass
ct_abstention_k=0to recover raw argmax.99.0 → 99.9shifts ~5% of predicted labels; on aheld-out test-split sample it reproduces the canonical predictions slightly
better (92.5% vs 91.9% argmax agreement).
Packaging / infra
vocab.json,channel_mapping.yaml, andtraining/config/*.yaml(incl.combined_celltypes.yaml), which werepreviously outside the package tree and absent after
pip install.tifffiledeclared in the[train]extra..github/workflows/ci.yml); inference vs.[train]testboundary enforced.
LICENSEtext matches the OSI Apache 2.0 text exactly (LIC: Revert licence text to exactly match OSI Apache 2 #42);NOTICEaligned to the vanvalenlab convention.
Tests
35 test modules under
tests/(plustests/baselines/) covering canonicalinference, abstention CLI, checkpoint round-trip, dataset/split/sampler
behavior, preprocessing + the preprocess hook, losses, hierarchical eval,
archive-contract validation, baseline feature splits, and vendored-baseline
equivalence against upstream.
See
CHANGELOG.mdfor the full
0.1.0entry and migration notes.