From ced54b3d8dae0c9a5c0945c66ba8c9efce93a727 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 06:29:00 -0400
Subject: [PATCH 1/3] SpilloverDiD: ring-indicator spillover-aware DiD (Butts
 2021)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

New standalone estimator at `diff_diff/spillover.py` implementing
two-stage Gardner (2022) DiD with ring-indicator covariates that
identify, alongside the direct effect on treated (`tau_total`), per-ring
spillover effects on near-control units (`delta_j`). Reference:
Butts, K. (2023, originally 2021) "Difference-in-Differences with
Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage
differences in differences" arXiv:2207.05943.

Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered
timing in one estimator. Stage-2 regressor uses the time-varying
`(1 - D_it) * Ring_{it,j}` form. Stage-1 subsample is Butts' STRICTER
`Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed).

Identification-check policy: period strict, unit warn-and-drop, plus
connected-component check on the Omega_0 bipartite graph. SE clamps
negative vcov diagonals before sqrt (sibling-estimator convention).
`coefficients` exposes all (1+K) stage-2 entries keyed to vcov columns
plus an "ATT" alias. `rank_deficient_action` validated at `__init__`.

Variance: stage-2 OLS via `solve_ols` (HC1 / Conley / cluster). Gardner
GMM first-stage uncertainty correction NOT applied in this PR (documented
limitation; tracked in TODO).

Deferred features (planned follow-ups, all in TODO): `event_study=True`
per-event-time × ring coefficients, `survey_design=` integration,
`ring_method="count"`, data-driven `d_bar`, GMM correction at stage 2,
sparse staggered ring-distance path, TwoStageDiD-Conley first-class.

Tests: `tests/test_spillover.py` (153 tests) + DGP factories at
`tests/_dgp_utils.py`. Includes 20-seed Gardner identity bit-identity
test (`SpilloverDiD.att` matches single-stage TWFE ring regression at
`atol=1e-10` on non-staggered DGPs — the reported non-staggered
`tau_total` IS the Butts Eqs. 4-6 estimator). Non-staggered MC at 50
seeds + 200-seed slow variant recovers both `tau_total` and `delta_1`;
staggered MC at 30 seeds anchors `tau_total` only.

Docs: REGISTRY section, API rst, `llms.txt` + `llms-full.txt`, README
catalog entry, references, `doc-deps.yaml`, TODO follow-up rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  |    1 +
 README.md                                     |    1 +
 TODO.md                                       |    8 +
 diff_diff/__init__.py                         |    6 +
 diff_diff/guides/llms-full.txt                |   63 +
 diff_diff/guides/llms.txt                     |    1 +
 diff_diff/results.py                          |  111 +
 diff_diff/spillover.py                        | 1957 ++++++++++++
 .../_autosummary/diff_diff.SpilloverDiD.rst   |   21 +
 .../diff_diff.SpilloverDiDResults.rst         |   61 +
 docs/api/index.rst                            |    3 +
 docs/api/spillover.rst                        |  234 ++
 docs/doc-deps.yaml                            |   22 +
 docs/index.rst                                |    2 +
 docs/methodology/REGISTRY.md                  |   79 +
 docs/references.rst                           |    4 +
 tests/_dgp_utils.py                           |  250 ++
 tests/test_spillover.py                       | 2610 +++++++++++++++++
 18 files changed, 5434 insertions(+)
 create mode 100644 diff_diff/spillover.py
 create mode 100644 docs/api/_autosummary/diff_diff.SpilloverDiD.rst
 create mode 100644 docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst
 create mode 100644 docs/api/spillover.rst
 create mode 100644 tests/_dgp_utils.py
 create mode 100644 tests/test_spillover.py

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4e17a329..b45f1ae1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (153 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors `tau_total` only — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
 ### Changed
diff --git a/README.md b/README.md
index b54d8aa0..ba9cdf8b 100644
--- a/README.md
+++ b/README.md
@@ -106,6 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
 - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
 - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/TODO.md b/TODO.md
index 42ddd3a5..43f0d6d2 100644
--- a/TODO.md
+++ b/TODO.md
@@ -132,6 +132,14 @@ Deferred items from PR reviews that were not addressed before merge.
 | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
 | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
 | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
+| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
+| `SpilloverDiD(event_study=True)` per-event-time × ring decomposition (Butts Section 5 / Table 2 `S^k_{it}` / `Ring^k_{it,j}`). Currently raises `NotImplementedError`. The implementation adds event-time dummies × ring covariates to the stage-2 design and emits a MultiIndex on `spillover_effects`. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Medium |
+| `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
+| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
+| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
+| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
+| Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
+| `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
 
 #### Performance
 
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
index aa48ff77..cee94043 100644
--- a/diff_diff/__init__.py
+++ b/diff_diff/__init__.py
@@ -171,6 +171,10 @@
     TwoStageDiDResults,
     two_stage_did,
 )
+from diff_diff.spillover import (
+    SpilloverDiD,
+)
+from diff_diff.results import SpilloverDiDResults  # re-export
 from diff_diff.stacked_did import (
     StackedDiD,
     StackedDiDResults,
@@ -300,6 +304,7 @@
     "SunAbraham",
     "ImputationDiD",
     "TwoStageDiD",
+    "SpilloverDiD",
     "TripleDifference",
     "TROP",
     "StackedDiD",
@@ -341,6 +346,7 @@
     "TwoStageDiDResults",
     "TwoStageBootstrapResults",
     "two_stage_did",
+    "SpilloverDiDResults",
     "TripleDifferenceResults",
     "triple_difference",
     "StaggeredTripleDifference",
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index 5bbe74c2..14a66e76 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -461,6 +461,69 @@ results = est.fit(data, outcome='outcome', unit='unit',
 results.print_summary()
 ```
 
+### SpilloverDiD
+
+Butts (2021) ring-indicator spillover-aware DiD. Augments two-stage Gardner with ring-indicator covariates that identify direct effect on treated (`tau_total`) and per-ring spillover effects on near-control units (`delta_j`). Handles non-staggered and staggered timing in one estimator. Recommends `vcov_type="conley"` with cutoff = `d_bar` (paper Section 3.1).
+
+```python
+SpilloverDiD(
+    rings: list[float],                  # K+1 sorted breakpoints; K rings
+    d_bar: float | None = None,          # Far-away cutoff (defaults to max(rings))
+    vcov_type: str = "hc1",              # "hc1", "conley", or default cluster
+    conley_coords: tuple[str, str] | None = None,  # (lat_col, lon_col), required
+    conley_metric: str = "haversine",    # or "euclidean" / callable
+    conley_cutoff_km: float | None = None,
+    conley_lag_cutoff: int | None = None,
+    cluster: str | None = None,
+    alpha: float = 0.05,
+    anticipation: int = 0,
+    event_study: bool = False,           # Deferred: raises NotImplementedError if True
+    horizon_max: int | None = None,      # Deferred (event-study mode)
+    rank_deficient_action: str = "warn",
+)
+```
+
+**fit() parameters:**
+
+```python
+sp.fit(
+    data: pd.DataFrame,
+    outcome: str,
+    unit: str,
+    time: str,
+    treatment: str | None = None,        # binary D_it; auto-converted to first_treat
+    first_treat: str | None = None,      # OR onset time per unit (Gardner)
+    covariates: list[str] | None = None, # Deferred: NotImplementedError if non-None
+    survey_design: object = None,        # Deferred: NotImplementedError if non-None
+) -> SpilloverDiDResults
+```
+
+**Restrictions (Wave B MVP — planned follow-ups):**
+
+- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
+- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
+- `event_study=True` raises `NotImplementedError` (planned: per-event-time × ring decomposition per Butts Table 2)
+- `horizon_max=` raises `NotImplementedError` (used only with event_study)
+- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
+- Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
+
+**Usage:**
+
+```python
+from diff_diff import SpilloverDiD
+
+est = SpilloverDiD(
+    rings=[0, 50, 100, 200],
+    conley_coords=("lat", "lon"),
+    vcov_type="conley",
+    conley_cutoff_km=200.0,
+    conley_lag_cutoff=0,
+)
+results = est.fit(data, outcome="y", unit="unit", time="time", treatment="D")
+print(f"tau_total = {results.att:.4f}")
+print(results.spillover_effects)         # per-ring DataFrame
+```
+
 ### SyntheticDiD
 
 Synthetic Difference-in-Differences (Arkhangelsky et al. 2021). Combines DiD with synthetic control by re-weighting control units.
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index df0a880a..a310d621 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -58,6 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
 - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/diff_diff/results.py b/diff_diff/results.py
index d7f4a3d8..02b7e5a8 100644
--- a/diff_diff/results.py
+++ b/diff_diff/results.py
@@ -354,6 +354,117 @@ def _get_significance_stars(p_value: float) -> str:
     return ""
 
 
+@dataclass
+class SpilloverDiDResults(DiDResults):
+    """Results from a ring-indicator spillover-aware DiD estimation (Butts 2021).
+
+    Extends :class:`DiDResults` with per-ring spillover effect estimates and
+    diagnostic counts for the identifying control population.
+
+    Attributes
+    ----------
+    att : float
+        Total effect on the treated, ``tau_total`` (inherited from DiDResults).
+    spillover_effects : pd.DataFrame, optional
+        Per-ring spillover-on-control estimates. Columns: ``["coef", "se",
+        "t_stat", "p_value", "ci_low", "ci_high"]``. Index: ring label
+        like ``"[0, 50)"``; for event-study output the index is a
+        ``MultiIndex`` over ``(ring_label, event_time)``.
+    ring_breakpoints : list of float
+        Sorted distance breakpoints used to construct the rings.
+    d_bar : float
+        Far-away cutoff (Butts Assumption 5). Equals ``max(ring_breakpoints)``
+        unless explicitly overridden.
+    n_units_ever_in_ring : dict[str, int]
+        Counts of UNIQUE units that appear in each ring AT LEAST ONCE
+        across observed periods. Under staggered timing the ring
+        membership is time-varying, so a unit can be counted in
+        multiple rings (one per period it visits). Under non-staggered
+        timing the rings are unit-static, so this is a clean partition
+        (each unit appears in exactly one ring or the far-away group).
+        Includes treated units in Ring_1 by geometric construction
+        even though the ``(1 - D_it)`` factor zeros their stage-2
+        contribution.
+    n_far_away_obs : int
+        Number of observations with ``D_it = 0`` AND ``d_it > d_bar``;
+        these observations identify the counterfactual trend (Butts
+        Assumption 5(ii)).
+    is_staggered : bool
+        True if multiple distinct treatment-onset times were detected.
+    event_study : bool
+        True if per-event-time ring coefficients were emitted (i.e.,
+        ``self.event_study=True`` was set on the estimator).
+    stage1_n_obs : int
+        Number of observations in the stage-1 untreated-and-unexposed
+        subsample (``Omega_0_butts``).
+    """
+
+    spillover_effects: Optional[pd.DataFrame] = field(default=None)
+    ring_breakpoints: Optional[List[float]] = field(default=None)
+    d_bar: Optional[float] = field(default=None)
+    n_units_ever_in_ring: Optional[Dict[str, int]] = field(default=None)
+    n_far_away_obs: Optional[int] = field(default=None)
+    is_staggered: Optional[bool] = field(default=None)
+    event_study: Optional[bool] = field(default=None)
+    stage1_n_obs: Optional[int] = field(default=None)
+    anticipation: Optional[int] = field(default=None)
+
+    def summary(self, alpha: Optional[float] = None) -> str:
+        """Extended summary with ATT row plus per-ring rows."""
+        base = super().summary(alpha=alpha)
+        if self.spillover_effects is None or self.spillover_effects.empty:
+            return base
+        lines = base.split("\n")
+        # Find the closing separator line and inject ring rows before it.
+        ring_rows = ["", "Spillover Effects (ring-indicator, Butts 2021)".center(70), "-" * 70]
+        header = (
+            f"{'Ring':<15} {'Estimate':>12} {'Std. Err.':>12} "
+            f"{'t-stat':>10} {'P>|t|':>10} {'':>5}"
+        )
+        ring_rows.append(header)
+        ring_rows.append("-" * 70)
+        for label, row in self.spillover_effects.iterrows():
+            coef = row.get("coef", np.nan)
+            se = row.get("se", np.nan)
+            t_stat = row.get("t_stat", np.nan)
+            p_value = row.get("p_value", np.nan)
+            stars = _get_significance_stars(p_value)
+            label_str = str(label) if not isinstance(label, tuple) else f"{label[0]} k={label[1]}"
+            ring_rows.append(
+                f"{label_str[:15]:<15} {coef:>12.4f} {se:>12.4f} "
+                f"{t_stat:>10.3f} {p_value:>10.4f} {stars:>5}"
+            )
+        ring_rows.append("-" * 70)
+        # Insert ring block before the final "==..." line (last row of base).
+        for idx in range(len(lines) - 1, -1, -1):
+            if lines[idx].startswith("="):
+                lines = lines[:idx] + ring_rows + lines[idx:]
+                break
+        return "\n".join(lines)
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Override to serialize ``spillover_effects`` DataFrame as list-of-dicts."""
+        base = super().to_dict()
+        base.update(
+            {
+                "spillover_effects": (
+                    self.spillover_effects.reset_index().to_dict(orient="records")
+                    if self.spillover_effects is not None
+                    else None
+                ),
+                "ring_breakpoints": self.ring_breakpoints,
+                "d_bar": self.d_bar,
+                "n_units_ever_in_ring": self.n_units_ever_in_ring,
+                "n_far_away_obs": self.n_far_away_obs,
+                "is_staggered": self.is_staggered,
+                "event_study": self.event_study,
+                "stage1_n_obs": self.stage1_n_obs,
+                "anticipation": self.anticipation,
+            }
+        )
+        return base
+
+
 @dataclass
 class PeriodEffect:
     """
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
new file mode 100644
index 00000000..b8895e9d
--- /dev/null
+++ b/diff_diff/spillover.py
@@ -0,0 +1,1957 @@
+"""
+SpilloverDiD — Butts (2021) ring-indicator spillover-aware DiD.
+
+Augments a two-stage Gardner (2022) DiD with ring-indicator covariates that
+identify the spillover effect on near-control units alongside the direct
+effect on treated units. Handles both panel non-staggered and Section 5
+staggered timing in a single estimator.
+
+References
+----------
+Butts, K. (2023). Difference-in-Differences with Spatial Spillovers.
+    arXiv:2105.03737v3 (originally posted 2021).
+Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943.
+
+Notes
+-----
+The paper's notation in Equation 5/6 is ``(1 - D_it) * Ring_{ij}`` with
+``S_i`` unit-static. Reading that literally under a two-way fixed effects
+specification yields a rank-deficient design (``(1 - D_it) * S_i = S_i -
+D_it``; ``S_i`` is absorbed by ``mu_i``, leaving ``-D_it``). The paper
+defines ``S_it = S_i * 1{t >= t_treat}`` (page 12, just above Equation 5)
+and Section 5's Table 2 makes the time-varying form explicit
+(``S^k_{it}``, ``Ring^k_{it,j}``). This implementation uses the
+time-varying form, which is the spec that supports the paper's
+identification argument (Proposition 2.3 + Section 3.1 subsample logic).
+"""
+
+import warnings
+from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union
+
+import numpy as np
+import pandas as pd
+
+from diff_diff.conley import (
+    _CONLEY_EARTH_RADIUS_KM,
+    _CONLEY_SPARSE_N_THRESHOLD,
+    _haversine_km,
+    _validate_callable_metric_result,
+)
+from diff_diff.linalg import solve_ols
+from diff_diff.results import SpilloverDiDResults
+from diff_diff.utils import safe_inference
+
+# Type alias mirroring diff_diff.conley.ConleyMetric so callers can supply
+# any of the built-in identifiers or a user callable returning a pairwise
+# distance matrix.
+SpilloverMetric = Union[
+    Literal["haversine", "euclidean"],
+    Callable[[np.ndarray, np.ndarray], np.ndarray],
+]
+
+
+# =============================================================================
+# Ring construction helpers (Step 1)
+# =============================================================================
+
+
+def _haversine_km_pairwise(
+    coords_a: np.ndarray,
+    coords_b: np.ndarray,
+) -> np.ndarray:
+    """Vectorized pairwise great-circle distance (km) between two coord sets.
+
+    Parameters
+    ----------
+    coords_a : ndarray of shape (n_a, 2)
+        ``(lat, lon)`` in DEGREES for the first set of points.
+    coords_b : ndarray of shape (n_b, 2)
+        ``(lat, lon)`` in DEGREES for the second set of points.
+
+    Returns
+    -------
+    ndarray of shape (n_a, n_b)
+        Great-circle distances in km. Matches the ``_haversine_km`` Earth
+        radius convention (6371.01 km, mirroring R ``conleyreg``).
+    """
+    lat_a = coords_a[:, 0][:, None]
+    lon_a = coords_a[:, 1][:, None]
+    lat_b = coords_b[:, 0][None, :]
+    lon_b = coords_b[:, 1][None, :]
+    return _haversine_km(lat_a, lon_a, lat_b, lon_b)
+
+
+def _euclidean_pairwise(
+    coords_a: np.ndarray,
+    coords_b: np.ndarray,
+) -> np.ndarray:
+    """Vectorized pairwise Euclidean distance between two coord sets.
+
+    Coordinates are treated as planar; no unit conversion. Matches the
+    ``_pairwise_distance_matrix`` Euclidean branch of ``conley.py``.
+    """
+    diffs = coords_a[:, None, :] - coords_b[None, :, :]
+    return np.sqrt(np.einsum("ijk,ijk->ij", diffs, diffs))
+
+
+def _apply_callable_metric_pairwise(
+    metric: Callable[[np.ndarray, np.ndarray], np.ndarray],
+    coords_a: np.ndarray,
+    coords_b: np.ndarray,
+) -> np.ndarray:
+    """Apply a user-supplied callable metric to two coord sets.
+
+    Unlike :func:`_validate_callable_metric_result` which checks square
+    ``(n, n)`` symmetry on a single coord set, this helper accepts a
+    rectangular ``(n_a, n_b)`` result. The validator is therefore relaxed:
+    we only require finiteness, non-negativity, and correct shape. The
+    zero-diagonal / symmetry checks apply only when the same coord set is
+    passed on both sides; ring-construction usage passes a treated-only
+    subset on side B, so the diagonal of the rectangular result is not
+    meaningful.
+    """
+    result = metric(coords_a, coords_b)
+    arr = np.asarray(result, dtype=np.float64)
+    expected_shape = (coords_a.shape[0], coords_b.shape[0])
+    if arr.shape != expected_shape:
+        raise ValueError(
+            "conley_metric callable returned shape "
+            f"{arr.shape} for pairwise ring distance, expected {expected_shape}."
+        )
+    if not np.isfinite(arr).all():
+        raise ValueError(
+            "conley_metric callable returned non-finite entries for pairwise "
+            "ring distance; all distances must be finite."
+        )
+    if (arr < 0.0).any():
+        raise ValueError(
+            "conley_metric callable returned negative entries for pairwise "
+            "ring distance; all distances must be non-negative."
+        )
+    return arr
+
+
+def _pairwise_ring_distances(
+    coords_units: np.ndarray,
+    coords_treated: np.ndarray,
+    metric: SpilloverMetric,
+) -> np.ndarray:
+    """Compute (n_units, n_treated) pairwise distances under the chosen metric."""
+    if callable(metric):
+        return _apply_callable_metric_pairwise(metric, coords_units, coords_treated)
+    if metric == "haversine":
+        return _haversine_km_pairwise(coords_units, coords_treated)
+    if metric == "euclidean":
+        return _euclidean_pairwise(coords_units, coords_treated)
+    raise ValueError(
+        f"Unknown conley_metric: {metric!r}. Expected 'haversine', 'euclidean', "
+        "or a callable returning a pairwise distance matrix."
+    )
+
+
+def _compute_nearest_treated_distance_static(
+    data: pd.DataFrame,
+    *,
+    unit: str,
+    coords: Tuple[str, str],
+    metric: SpilloverMetric,
+    treated_unit_ids: np.ndarray,
+    cutoff_km: Optional[float] = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Return per-unit nearest-treated distance for the non-staggered case.
+
+    The set of treated units is fixed (ever-treated), so distances are
+    unit-level constants and don't vary across periods. Caller broadcasts
+    to per-row when assembling ring covariates.
+
+    Parameters
+    ----------
+    data : pd.DataFrame
+        Panel data with one row per (unit, period). Used to extract
+        per-unit coords via :meth:`DataFrame.drop_duplicates` on ``unit``.
+    unit : str
+        Column name of the unit identifier.
+    coords : tuple of (str, str)
+        ``(lat_col, lon_col)``.
+    metric : "haversine" | "euclidean" | callable
+        Distance metric. For ``"haversine"``, ``coords`` is interpreted as
+        ``(lat, lon)`` in degrees. For ``"euclidean"``, ``coords`` is
+        planar. Callable receives two ``(n, 2)`` arrays and must return an
+        ``(n_a, n_b)`` finite non-negative distance matrix.
+    treated_unit_ids : ndarray
+        IDs of ever-treated units (used as side B of pairwise distance).
+    cutoff_km : float, optional
+        If set and ``len(unit_index) > _CONLEY_SPARSE_N_THRESHOLD``, the
+        sparse cKDTree path is used to find treated neighbors within
+        ``cutoff_km`` per unit; otherwise the dense (n_units × n_treated)
+        matrix is built. Units with no treated neighbor within ``cutoff_km``
+        receive ``d_i = inf`` (they fall outside any ring and into the
+        far-away control group, identical to dense-path behavior with
+        infinite distance to the nearest reached treated unit).
+
+    Returns
+    -------
+    d_i : ndarray of shape (n_unique_units,)
+        ``d_i = min_{k in treated_unit_ids} d(i, k)`` per unique unit.
+    unit_index : ndarray of shape (n_unique_units,)
+        Unit identifiers in the same order as ``d_i``.
+    """
+    unit_coords_df = (
+        data[[unit, coords[0], coords[1]]]
+        .drop_duplicates(subset=[unit])
+        .set_index(unit)
+        .sort_index()
+    )
+    unit_index = np.asarray(unit_coords_df.index.values)
+    all_coords = np.asarray(unit_coords_df[[coords[0], coords[1]]].values, dtype=np.float64)
+    treated_set = set(treated_unit_ids.tolist())
+    treated_mask = np.array([uid in treated_set for uid in unit_index], dtype=bool)
+    treated_coords = all_coords[treated_mask]
+    if treated_coords.shape[0] == 0:
+        raise ValueError(
+            "_compute_nearest_treated_distance_static: no treated units present "
+            "in `data` matching `treated_unit_ids`."
+        )
+
+    n_units = all_coords.shape[0]
+    is_builtin_metric = metric in ("haversine", "euclidean")
+    if cutoff_km is not None and n_units > _CONLEY_SPARSE_N_THRESHOLD and is_builtin_metric:
+        d_i = _compute_nearest_treated_distance_sparse(
+            all_coords=all_coords,
+            treated_coords=treated_coords,
+            metric=metric,  # type: ignore[arg-type]
+            cutoff_km=float(cutoff_km),
+        )
+    else:
+        # Dense path: full pairwise matrix, then row-min.
+        dists = _pairwise_ring_distances(all_coords, treated_coords, metric)
+        d_i = dists.min(axis=1)
+    return d_i.astype(np.float64), unit_index
+
+
+def _compute_nearest_treated_distance_sparse(
+    *,
+    all_coords: np.ndarray,
+    treated_coords: np.ndarray,
+    metric: Literal["haversine", "euclidean"],
+    cutoff_km: float,
+) -> np.ndarray:
+    """Sparse cKDTree path for nearest-treated-distance computation.
+
+    Used when ``n_units > _CONLEY_SPARSE_N_THRESHOLD`` AND the metric is a
+    built-in string. The tree is built on the treated subset (small) and
+    queried with each unit row. Units with no treated neighbor inside
+    ``cutoff_km`` get ``d_i = inf``, which places them in the far-away
+    control group on the downstream ring-membership step.
+
+    For haversine: lat/lon are projected to 3-D unit-sphere Cartesian
+    coordinates; the chord-distance query radius is
+    ``2 * sin(arc / (2 * R_earth))`` with arc clamped at ``pi * R_earth``
+    so cutoffs beyond a hemisphere don't shrink. Exact great-circle
+    distances are then recomputed via :func:`_haversine_km` for the in-
+    range matches and the per-row minimum is taken.
+
+    For euclidean: planar L2 directly in cKDTree.
+
+    Parameters
+    ----------
+    all_coords : ndarray of shape (n_units, 2)
+        Coordinates for all units.
+    treated_coords : ndarray of shape (n_treated, 2)
+        Coordinates for ever-treated units.
+    metric : 'haversine' or 'euclidean'
+        Built-in metric only; callables fall back to the dense path.
+    cutoff_km : float
+        Maximum considered distance. Units beyond this get ``d_i = inf``.
+
+    Returns
+    -------
+    ndarray of shape (n_units,)
+        Nearest-treated distance per unit (inf when no neighbor in range).
+    """
+    # Imported lazily to mirror conley.py's lazy-scipy pattern and keep
+    # module import cheap when the sparse path isn't exercised.
+    from scipy.spatial import cKDTree  # noqa: WPS433  (deferred import)
+
+    n_units = all_coords.shape[0]
+    if metric == "haversine":
+        # Project lat/lon (degrees) to 3-D unit-sphere Cartesian.
+        lat_rad_all = np.radians(all_coords[:, 0])
+        lon_rad_all = np.radians(all_coords[:, 1])
+        unit_xyz = np.column_stack(
+            [
+                np.cos(lat_rad_all) * np.cos(lon_rad_all),
+                np.cos(lat_rad_all) * np.sin(lon_rad_all),
+                np.sin(lat_rad_all),
+            ]
+        )
+        lat_rad_tr = np.radians(treated_coords[:, 0])
+        lon_rad_tr = np.radians(treated_coords[:, 1])
+        tree_xyz = np.column_stack(
+            [
+                np.cos(lat_rad_tr) * np.cos(lon_rad_tr),
+                np.cos(lat_rad_tr) * np.sin(lon_rad_tr),
+                np.sin(lat_rad_tr),
+            ]
+        )
+        # Chord-distance radius for the query; clamp arc at pi (a half-revolution)
+        # so cutoffs > pi * R_earth do not shrink chord radius below the true reach.
+        arc_radians = min(cutoff_km / _CONLEY_EARTH_RADIUS_KM, np.pi)
+        query_r = 2.0 * np.sin(arc_radians / 2.0)
+        query_r *= 1.0 + 1e-12  # numerical safety margin
+        tree = cKDTree(tree_xyz)
+        # Query in chord space, recompute exact great-circle distance for matches.
+        neighbors = tree.query_ball_point(unit_xyz, r=query_r, p=2.0)
+        d_i = np.full(n_units, np.inf, dtype=np.float64)
+        for i, idxs in enumerate(neighbors):
+            if not idxs:
+                continue
+            # Exact great-circle distance for the in-range treated neighbors.
+            arr_idxs = np.asarray(idxs, dtype=np.intp)
+            d_subset = _haversine_km(
+                all_coords[i, 0],
+                all_coords[i, 1],
+                treated_coords[arr_idxs, 0],
+                treated_coords[arr_idxs, 1],
+            )
+            d_i[i] = float(d_subset.min())
+        return d_i
+
+    # Euclidean: cKDTree handles directly.
+    tree = cKDTree(treated_coords)
+    d_i = np.full(n_units, np.inf, dtype=np.float64)
+    neighbors = tree.query_ball_point(all_coords, r=cutoff_km, p=2.0)
+    for i, idxs in enumerate(neighbors):
+        if not idxs:
+            continue
+        arr_idxs = np.asarray(idxs, dtype=np.intp)
+        d_subset = _euclidean_pairwise(all_coords[i : i + 1], treated_coords[arr_idxs])
+        d_i[i] = float(d_subset.min())
+    return d_i
+
+
+def _compute_nearest_treated_distance_staggered(
+    data: pd.DataFrame,
+    *,
+    unit: str,
+    time: str,
+    coords: Tuple[str, str],
+    metric: SpilloverMetric,
+    first_treat_by_unit: Dict[Any, Any],
+) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
+    """Return per-row nearest-treated distance for the staggered case.
+
+    For each (unit, period) observation, find the minimum distance to any
+    unit that is treated BY THE END of that period (``first_treat_k <=
+    t``). Ring membership in the staggered case is therefore unit-time
+    varying.
+
+    Parameters
+    ----------
+    data : pd.DataFrame
+        Panel data (one row per unit-period).
+    unit : str
+        Unit identifier column name.
+    time : str
+        Time period column name.
+    coords : tuple of (str, str)
+        ``(lat_col, lon_col)``.
+    metric : "haversine" | "euclidean" | callable
+        Distance metric.
+    first_treat_by_unit : dict
+        Mapping from unit identifier to onset time (or ``np.inf`` for
+        never-treated). Generated by :func:`_extract_treatment_onsets`.
+
+    Notes
+    -----
+    The staggered helper currently always uses dense pairwise distance per
+    cohort. A sparse cKDTree branch (mirroring the static helper) is queued
+    as a follow-up — see TODO.md.
+
+    Returns
+    -------
+    d_it : ndarray of shape (n_rows,)
+        Per-row nearest-treated distance, with ``inf`` for rows where no
+        unit has been treated yet by time t (early periods).
+    row_unit : ndarray of shape (n_rows,)
+        Aligned unit identifier per row (for downstream broadcasting).
+    row_time : ndarray of shape (n_rows,)
+        Aligned time identifier per row.
+    """
+    unit_coords_df = (
+        data[[unit, coords[0], coords[1]]].drop_duplicates(subset=[unit]).set_index(unit)
+    )
+    unit_index = np.asarray(unit_coords_df.index.values)
+    all_coords = np.asarray(unit_coords_df[[coords[0], coords[1]]].values, dtype=np.float64)
+    unit_to_pos = {uid: pos for pos, uid in enumerate(unit_index)}
+
+    row_unit = np.asarray(data[unit].values)
+    row_time = np.asarray(data[time].values)
+    n_rows = len(row_unit)
+    d_it = np.full(n_rows, np.inf, dtype=np.float64)
+
+    # Determine the cohort onset times that exist in the data (excluding never-treated).
+    unique_onsets = sorted({ft for ft in first_treat_by_unit.values() if np.isfinite(ft)})
+    if not unique_onsets:
+        # Degenerate: no treated units. Caller should have rejected this
+        # in `_validate_spillover_inputs`, but defensively return inf.
+        return d_it, row_unit, row_time
+
+    # Row's unit position. Invariant across cohort iterations — compute
+    # once outside the loop.
+    row_pos = np.array([unit_to_pos[uid] for uid in row_unit], dtype=np.intp)
+
+    # For each unique onset time, compute (n_units, n_treated_by_then) pairwise
+    # distances ONCE, then assign to rows whose t >= that onset (carrying forward
+    # the minimum across cohorts).
+    for onset in unique_onsets:
+        treated_by_onset_ids = [uid for uid, ft in first_treat_by_unit.items() if ft <= onset]
+        treated_positions = np.array(
+            [unit_to_pos[uid] for uid in treated_by_onset_ids if uid in unit_to_pos],
+            dtype=np.intp,
+        )
+        if treated_positions.size == 0:
+            continue
+        treated_coords = all_coords[treated_positions]
+        # Compute per-unit nearest distance to this cohort's treated set.
+        dists_to_cohort = _pairwise_ring_distances(all_coords, treated_coords, metric).min(axis=1)
+        # Update rows whose period t >= onset: take min of current d_it and the
+        # newly-available cohort distance.
+        affected_rows = row_time >= onset
+        if not affected_rows.any():
+            continue
+        row_cohort_dist = dists_to_cohort[row_pos]
+        # Only update rows where this cohort's distance is smaller than the
+        # current d_it (carries the running minimum across cohorts).
+        update_mask = affected_rows & (row_cohort_dist < d_it)
+        d_it[update_mask] = row_cohort_dist[update_mask]
+
+    return d_it, row_unit, row_time
+
+
+def _build_ring_indicators(
+    d_values: np.ndarray,
+    rings: List[float],
+) -> np.ndarray:
+    """Build K boolean ring masks from distances and breakpoints.
+
+    Convention (per Butts Equation 6 + plan Risks #2): half-open at the
+    top of each interior ring, CLOSED at the outermost upper edge so units
+    exactly at ``d_bar`` belong to the last ring (not the far-away group).
+    Far-away controls use a strict ``d_i > d_bar`` check (handled
+    elsewhere). Treated units have ``d_i = 0`` and fall in Ring_1 by
+    construction; their ring contribution is later zeroed by the
+    ``(1 - D_i)`` factor.
+
+    Parameters
+    ----------
+    d_values : ndarray
+        Distances (per-unit for non-staggered or per-row for staggered).
+    rings : list of float
+        Sorted breakpoints with ``len(rings) >= 2``. ``K = len(rings) - 1``
+        rings are constructed.
+
+    Returns
+    -------
+    masks : ndarray of shape (len(d_values), K), bool
+        ``masks[i, j] = True`` if ``d_values[i]`` falls in ring ``j``.
+
+    Raises
+    ------
+    ValueError
+        ``rings`` has fewer than 2 elements, or is not strictly increasing.
+    """
+    rings_arr = np.asarray(rings, dtype=np.float64)
+    if rings_arr.ndim != 1 or rings_arr.size < 2:
+        raise ValueError(
+            "rings must be a sorted list of at least 2 breakpoints "
+            f"(got shape {rings_arr.shape})."
+        )
+    if (np.diff(rings_arr) <= 0).any():
+        raise ValueError("rings must be strictly increasing; got " f"{rings_arr.tolist()}.")
+    if (rings_arr < 0).any():
+        raise ValueError("rings must be non-negative; got " f"{rings_arr.tolist()}.")
+
+    n = d_values.shape[0]
+    K = rings_arr.size - 1
+    masks = np.zeros((n, K), dtype=bool)
+    for j in range(K):
+        lo = rings_arr[j]
+        hi = rings_arr[j + 1]
+        if j == K - 1:
+            # Outermost ring: closed at d_bar so units at the boundary
+            # belong to this ring (not the far-away group).
+            masks[:, j] = (d_values >= lo) & (d_values <= hi)
+        else:
+            # Interior rings: half-open at top so the breakpoint between
+            # adjacent rings unambiguously falls in the next ring.
+            masks[:, j] = (d_values >= lo) & (d_values < hi)
+    return masks
+
+
+def _ring_label(rings: List[float], j: int) -> str:
+    """Render the human-readable ring label for index ``j``.
+
+    Convention matches :func:`_build_ring_indicators`: half-open at the
+    top of interior rings, closed at the outermost upper edge.
+    """
+    K = len(rings) - 1
+    lo = rings[j]
+    hi = rings[j + 1]
+    if j == K - 1:
+        return f"[{lo:g}, {hi:g}]"
+    return f"[{lo:g}, {hi:g})"
+
+
+# =============================================================================
+# Treatment-timing helpers (Step 2)
+# =============================================================================
+
+
+def _extract_treatment_onsets(
+    data: pd.DataFrame,
+    first_treat_col: str,
+    unit_col: str,
+    *,
+    treat_zero_as_never_treated: bool = True,
+) -> Dict[Any, float]:
+    """Return a dict mapping each unit to its treatment onset time.
+
+    Parameters
+    ----------
+    treat_zero_as_never_treated : bool, default True
+        When True (default, matching Gardner / TwoStageDiD user convention),
+        ``first_treat = 0`` is treated as a never-treated sentinel
+        equivalent to ``np.inf``. Set to False for INTERNAL onset columns
+        produced by :func:`_convert_treatment_to_first_treat` from a
+        binary ``D`` column — there, ``0`` may legitimately be the
+        onset time on 0-indexed panels (a unit treated at the first
+        observed period gets ``first_treat = 0``). The auto-generated
+        column writes ``np.inf`` for never-treated, so the 0-as-sentinel
+        collision is avoided.
+
+    Notes
+    -----
+    If a unit has non-constant ``first_treat`` values across its rows,
+    ``ValueError`` is raised — SpilloverDiD requires the
+    absorbing-treatment assumption (one onset per unit). Mirrors
+    :class:`TwoStageDiD`'s warning behaviour, but escalates to a hard error
+    because the spillover identification math depends on each unit having a
+    single well-defined ``S_it`` trajectory.
+    """
+
+    def _normalize(v: float) -> float:
+        if np.isinf(v):
+            return np.inf
+        if treat_zero_as_never_treated and v == 0:
+            return np.inf
+        return float(v)
+
+    onsets: Dict[Any, float] = {}
+    non_constant_units: List[Any] = []
+    for unit_id, group in data.groupby(unit_col):
+        ft_unique = group[first_treat_col].dropna().unique().tolist()
+        normalized = {_normalize(v) for v in ft_unique}
+        if len(normalized) > 1:
+            non_constant_units.append(unit_id)
+            continue
+        if not normalized:
+            # All rows are NaN → treat as never-treated.
+            onsets[unit_id] = np.inf
+            continue
+        # Use the unique value, not iloc[0], to avoid being fooled by a
+        # leading-NaN row when the rest of the unit is consistently treated.
+        ft = next(iter(normalized))
+        onsets[unit_id] = ft  # already normalized: np.inf for never-treated, float otherwise
+    if non_constant_units:
+        sample = non_constant_units[:5]
+        suffix = f" (and {len(non_constant_units) - 5} more)" if len(non_constant_units) > 5 else ""
+        raise ValueError(
+            f"{len(non_constant_units)} unit(s) have non-constant "
+            f"'{first_treat_col}' values across rows (e.g. {sample}{suffix}). "
+            "SpilloverDiD requires the absorbing-treatment assumption "
+            "(one onset per unit, treatment never reverses). For "
+            "non-absorbing / reversible treatments, see "
+            "ChaisemartinDHaultfoeuille."
+        )
+    return onsets
+
+
+def _convert_treatment_to_first_treat(
+    data: pd.DataFrame,
+    treatment: str,
+    time: str,
+    unit: str,
+) -> Tuple[pd.DataFrame, str]:
+    """Auto-convert a binary ``D_it`` column to a per-unit ``first_treat`` column.
+
+    Returns a defensive-copy frame augmented with a new
+    ``"_spillover_first_treat"`` column whose value per unit is
+    ``min{t : D_it = 1}`` for ever-treated units and ``np.inf`` for
+    never-treated. The original ``treatment`` column is preserved.
+
+    **Absorbing-treatment validation:** after extracting ``first_treat``,
+    each ever-treated unit's ``D_it`` is verified to be 1 at all rows with
+    ``t >= first_treat[unit]`` (treatment never reverses). Non-absorbing
+    patterns like ``[0, 1, 0]`` raise ``ValueError`` rather than being
+    silently coerced into ``first_treat = min(t | D_it = 1)``.
+
+    Raises
+    ------
+    ValueError
+        ``data`` does not contain a numeric ``treatment`` column or
+        ``time`` / ``unit`` columns; ``treatment`` has values outside
+        ``{0, 1}``; or treatment is non-absorbing for some unit.
+    """
+    if treatment not in data.columns:
+        raise ValueError(f"treatment column '{treatment}' not in data.")
+    # NaN in treatment is not silently coerced — it would later be rebuilt
+    # from `first_treat` and could flip a row from "unknown" to "treated"
+    # or "control" with no warning.
+    nan_mask = data[treatment].isna()
+    if bool(nan_mask.any()):
+        n_nan = int(nan_mask.sum())
+        raise ValueError(
+            f"treatment column '{treatment}' contains {n_nan} NaN value(s). "
+            "SpilloverDiD requires explicit 0/1 status on every row; "
+            "missing-treatment rows must be either imputed or dropped "
+            "before fitting (the auto-conversion path cannot silently "
+            "reclassify them since that would change tau_total and "
+            "delta_j without warning)."
+        )
+    treat_vals = data[treatment].unique()
+    # Exact binary check — NOT `int(v) in (0, 1)` (which would accept 0.9,
+    # 1.1, etc. by rounding-down semantics and silently misclassify
+    # fractional rows into the control group).
+    if not all(v in (0, 0.0, 1, 1.0) for v in treat_vals):
+        raise ValueError(
+            f"treatment column '{treatment}' must contain only exact 0/1 "
+            f"values; got unique values: {sorted(treat_vals)}. Fractional "
+            "values (e.g. 0.9 or 1.1) are NOT silently coerced — fix the "
+            "data or thresholding upstream before passing to SpilloverDiD."
+        )
+
+    out = data.copy(deep=False)
+    treated_rows = out[out[treatment] == 1]
+    if treated_rows.empty:
+        out["_spillover_first_treat"] = np.inf
+        return out, "_spillover_first_treat"
+
+    onset_by_unit = treated_rows.groupby(unit)[time].min()
+
+    # Verify absorbing: for each ever-treated unit, D_it must be EXACTLY 1
+    # at every row with t >= first_treat[unit]. NOT merely "not equal to 0"
+    # — that would silently accept e.g. NaN or other non-binary values that
+    # slipped past the binary check above (defense in depth).
+    reversing_units: List[Any] = []
+    for u in onset_by_unit.index:
+        onset = onset_by_unit.loc[u]
+        unit_rows = out[(out[unit] == u) & (out[time] >= onset)]
+        if (unit_rows[treatment] != 1).any():
+            reversing_units.append(u)
+    if reversing_units:
+        sample = reversing_units[:5]
+        suffix = f" (and {len(reversing_units) - 5} more)" if len(reversing_units) > 5 else ""
+        raise ValueError(
+            f"{len(reversing_units)} unit(s) have non-absorbing treatment "
+            f"patterns (treatment reverses to 0 after onset; e.g. units "
+            f"{sample}{suffix}). SpilloverDiD requires the absorbing-"
+            "treatment assumption — once a unit is treated, it stays "
+            "treated. For non-absorbing / reversible treatments, see "
+            "ChaisemartinDHaultfoeuille."
+        )
+
+    onset_lookup: Dict[Any, float] = {
+        uid: float(onset_by_unit.loc[uid]) if uid in onset_by_unit.index else np.inf
+        for uid in out[unit].unique()
+    }
+    out["_spillover_first_treat"] = out[unit].map(onset_lookup).astype(np.float64).values
+    return out, "_spillover_first_treat"
+
+
+# =============================================================================
+# Two-stage Gardner inline (Step 3)
+# =============================================================================
+
+# Convergence tolerance for the iterative alternating-projection FE solver
+# (Gauss-Seidel style; mirrors `TwoStageDiD._iterative_fe`).
+_FE_ITER_MAX = 100
+_FE_ITER_TOL = 1e-10
+
+
+def _check_omega_0_connectivity(
+    *,
+    omega_0_mask: np.ndarray,
+    unit_codes_arr: np.ndarray,
+    time_codes_arr: np.ndarray,
+    units_in_omega_0: set,
+    n_times: int,
+    unit_uniques: List[Any],
+) -> None:
+    """Raise ``ValueError`` if the Omega_0 bipartite graph is disconnected.
+
+    Stage 1's iterative FE solver identifies ``(mu_i, lambda_t)`` only up to
+    component-specific constants per connected component of the bipartite
+    graph (supported units on one side, periods on the other; edge =
+    Omega_0 row at that (unit, period) cell). If the graph splits into
+    K > 1 unit-bearing components, residualization later combines
+    ``mu_i`` from one component with ``lambda_t`` from another, silently
+    corrupting ``y_tilde`` and downstream ``tau_total`` / ``delta_j``.
+
+    Balanced panel + per-unit/per-period Omega_0 coverage is NECESSARY
+    but not SUFFICIENT — connectivity is the load-bearing identification
+    condition. Under the current absorbing-treatment + period-strict +
+    unit-warn-drop regime this case may be unreachable in practice (we
+    were unable to construct an example that survives the upstream
+    validators), but the check is defense-in-depth and future-proofs
+    Wave B extensions (event-study, survey-design integration, possible
+    reversible-treatment relaxations).
+    """
+    from scipy.sparse import csr_matrix
+    from scipy.sparse.csgraph import connected_components
+
+    supported_units_sorted = sorted(units_in_omega_0)
+    n_supp = len(supported_units_sorted)
+    if n_supp <= 1:
+        # No multi-component case possible with 0 or 1 supported units.
+        return
+
+    supp_unit_to_idx = {code: i for i, code in enumerate(supported_units_sorted)}
+
+    omega_unit_codes = unit_codes_arr[omega_0_mask]
+    omega_time_codes = time_codes_arr[omega_0_mask]
+
+    # Every Omega_0 row's unit is by definition in `units_in_omega_0`, so
+    # all rows contribute edges to the supported subgraph.
+    edge_unit_idx = np.array(
+        [supp_unit_to_idx[int(c)] for c in omega_unit_codes],
+        dtype=np.int64,
+    )
+    edge_time_offset = n_supp + np.asarray(omega_time_codes, dtype=np.int64)
+
+    # Symmetric adjacency: nodes 0..n_supp-1 are units, n_supp..n_supp+n_times-1
+    # are periods. Edge weights are 1 (presence only).
+    rows = np.concatenate([edge_unit_idx, edge_time_offset])
+    cols = np.concatenate([edge_time_offset, edge_unit_idx])
+    data_ones = np.ones(len(rows), dtype=np.int8)
+    adj = csr_matrix(
+        (data_ones, (rows, cols)),
+        shape=(n_supp + n_times, n_supp + n_times),
+    )
+
+    _, component_labels = connected_components(adj, directed=False)
+
+    # Count components that contain at least one supported UNIT node.
+    # (Period nodes unreachable from any unit form trivial singletons but
+    # those would already be caught by the period-level Omega_0 check
+    # upstream; here we only fail when there are 2+ unit-bearing
+    # components.)
+    unit_component_ids = set(int(c) for c in component_labels[:n_supp])
+    n_unit_components = len(unit_component_ids)
+
+    if n_unit_components <= 1:
+        return
+
+    # Build informative error: name first few units per component.
+    component_units: Dict[int, List[Any]] = {}
+    for unit_pos, unit_code in enumerate(supported_units_sorted):
+        comp = int(component_labels[unit_pos])
+        component_units.setdefault(comp, []).append(unit_uniques[unit_code])
+    component_summary = "; ".join(
+        f"component {comp_id}: {list(units[:3])}"
+        + (f" (+{len(units) - 3} more)" if len(units) > 3 else "")
+        for comp_id, units in list(sorted(component_units.items()))[:3]
+    )
+    raise ValueError(
+        f"Stage-1 fixed effects unidentified: the Omega_0 bipartite "
+        f"graph (supported units linked by shared untreated-and-"
+        f"unexposed periods) splits into {n_unit_components} "
+        f"disconnected components. Balanced panel and per-unit/per-"
+        f"period Omega_0 coverage are NECESSARY but not SUFFICIENT for "
+        f"joint identification — the iterative FE solver returns FE only "
+        f"up to component-specific constants, and residualization "
+        f"combines mu from one component with lambda from another, "
+        f"silently corrupting tau_total and delta_j. Examples: "
+        f"{component_summary}. To fix, ensure all supported units share "
+        f"at least one common Omega_0 period (e.g., add a far-away "
+        f"never-treated unit that observes the full time range)."
+    )
+
+
+def _iterative_fe_subset(
+    y_full: np.ndarray,
+    unit_codes_full: np.ndarray,
+    time_codes_full: np.ndarray,
+    omega_0_mask: np.ndarray,
+    *,
+    max_iter: int = _FE_ITER_MAX,
+    tol: float = _FE_ITER_TOL,
+) -> Tuple[np.ndarray, np.ndarray, bool]:
+    """Stage-1 iterative-alternating-projection FE solver on the Butts subsample.
+
+    Fits ``y[Omega_0] = mu_i + lambda_t + u`` on the untreated-and-unexposed
+    rows (``Omega_0_mask`` True). Returns FE arrays indexed by code, with
+    ``NaN`` at positions whose unit / time is not represented in the
+    subsample (rank-deficient cells).
+
+    Mirrors ``TwoStageDiD._iterative_fe`` structurally but operates on
+    integer-coded factors via ``np.bincount`` for speed and skips the
+    survey-weights branch (Wave B MVP).
+
+    Parameters
+    ----------
+    y_full : ndarray of shape (n_rows,)
+        Outcome vector for ALL observations (Omega_0 + treated/exposed).
+    unit_codes_full : ndarray of shape (n_rows,)
+        Integer factor codes per row in ``[0, n_units)``.
+    time_codes_full : ndarray of shape (n_rows,)
+        Integer factor codes per row in ``[0, n_times)``.
+    omega_0_mask : ndarray of shape (n_rows,), bool
+        True for rows in the stage-1 fit subsample (D_it=0 AND S_it=0).
+
+    Returns
+    -------
+    unit_fe_arr : ndarray of shape (n_units,)
+        Unit FE indexed by code. ``NaN`` for units absent from Omega_0.
+    time_fe_arr : ndarray of shape (n_times,)
+        Time FE indexed by code. ``NaN`` for periods absent from Omega_0.
+    converged : bool
+        Whether the iterative solver reached ``tol`` within ``max_iter``.
+    """
+    if omega_0_mask.sum() == 0:
+        raise ValueError(
+            "_iterative_fe_subset: Omega_0 (untreated-and-unexposed subsample) "
+            "is empty. Cannot fit stage-1 fixed effects. Check that some "
+            "control units have d_it > d_bar (Butts Assumption 5(ii))."
+        )
+
+    n_units = int(unit_codes_full.max()) + 1
+    n_times = int(time_codes_full.max()) + 1
+
+    # Operate on the subset only (faster than masking each iteration).
+    y_sub = y_full[omega_0_mask]
+    unit_sub = unit_codes_full[omega_0_mask]
+    time_sub = time_codes_full[omega_0_mask]
+    n_sub = len(y_sub)
+
+    alpha = np.zeros(n_sub)
+    beta = np.zeros(n_sub)
+    converged = False
+    for _ in range(max_iter):
+        # beta[t] = mean over rows in time-group t of (y - alpha)
+        resid = y_sub - alpha
+        time_sums = np.bincount(time_sub, weights=resid, minlength=n_times)
+        time_counts = np.bincount(time_sub, minlength=n_times)
+        time_means = np.where(time_counts > 0, time_sums / np.maximum(time_counts, 1), 0.0)
+        beta_new = time_means[time_sub]
+
+        # alpha[i] = mean over rows in unit-group i of (y - beta_new)
+        resid = y_sub - beta_new
+        unit_sums = np.bincount(unit_sub, weights=resid, minlength=n_units)
+        unit_counts = np.bincount(unit_sub, minlength=n_units)
+        unit_means = np.where(unit_counts > 0, unit_sums / np.maximum(unit_counts, 1), 0.0)
+        alpha_new = unit_means[unit_sub]
+
+        max_change = max(
+            float(np.max(np.abs(alpha_new - alpha))) if n_sub > 0 else 0.0,
+            float(np.max(np.abs(beta_new - beta))) if n_sub > 0 else 0.0,
+        )
+        alpha = alpha_new
+        beta = beta_new
+        if max_change < tol:
+            converged = True
+            break
+
+    # Build FE arrays indexed by code; NaN for unseen units/periods.
+    unit_fe_arr = np.full(n_units, np.nan, dtype=np.float64)
+    time_fe_arr = np.full(n_times, np.nan, dtype=np.float64)
+    # For each code present in the subset, take any row's converged value
+    # (constant within group at convergence). Sort-by-code to make access
+    # deterministic.
+    seen_unit_codes = np.unique(unit_sub)
+    for u_code in seen_unit_codes:
+        idx = np.flatnonzero(unit_sub == u_code)[0]
+        unit_fe_arr[u_code] = alpha[idx]
+    seen_time_codes = np.unique(time_sub)
+    for t_code in seen_time_codes:
+        idx = np.flatnonzero(time_sub == t_code)[0]
+        time_fe_arr[t_code] = beta[idx]
+    return unit_fe_arr, time_fe_arr, converged
+
+
+def _residualize_butts(
+    y_full: np.ndarray,
+    unit_codes_full: np.ndarray,
+    time_codes_full: np.ndarray,
+    unit_fe_arr: np.ndarray,
+    time_fe_arr: np.ndarray,
+) -> np.ndarray:
+    """Compute ``y_tilde = y - mu_hat[i] - lambda_hat[t]`` for ALL rows.
+
+    Rows whose unit or period has ``NaN`` FE (rank-deficient cells from
+    stage 1) get ``NaN`` y_tilde and are masked out of stage 2.
+    """
+    mu_per_row = unit_fe_arr[unit_codes_full]
+    lambda_per_row = time_fe_arr[time_codes_full]
+    return y_full - mu_per_row - lambda_per_row
+
+
+# =============================================================================
+# Public estimator (skeleton — fit() implemented in Step 3)
+# =============================================================================
+
+
+class SpilloverDiD:
+    """Ring-indicator spillover-aware DiD (Butts 2021).
+
+    Standalone estimator implementing two-stage Gardner (2022) methodology
+    with ring-indicator covariates that identify the direct effect on
+    treated units (``tau_total``) alongside per-ring spillover effects on
+    near-control units (``delta_j``). Supports both panel non-staggered
+    timing and Section 5 staggered timing in a single ``fit()`` entry
+    point — non-staggered is the special case where all treated units
+    share an onset time.
+
+    Parameters
+    ----------
+    rings : list of float
+        Sorted distance breakpoints with at least 2 elements. ``K =
+        len(rings) - 1`` rings are constructed.
+    d_bar : float, optional
+        Far-away cutoff (Butts Assumption 5). Defaults to ``max(rings)``;
+        if explicitly set, must equal ``max(rings)``. Wave B MVP does not
+        support a ``d_bar`` strictly larger than the outermost ring edge
+        (a "dead zone" where units satisfy ``rings[-1] < d_i <= d_bar``
+        but are in neither a ring nor the far-away group has no clean
+        methodological interpretation). To use a smaller spillover
+        bandwidth, shrink the outermost ring edge instead.
+    vcov_type : str, default="hc1"
+        Variance estimator. Set to ``"conley"`` and supply
+        ``conley_coords``/``conley_cutoff_km``/``conley_lag_cutoff`` to
+        enable Conley spatial-HAC at stage 2 (recommended per paper
+        Section 3.1).
+    conley_coords : tuple of (str, str), optional
+        ``(lat_col, lon_col)`` column names. Used for ring construction
+        AND for the Conley vcov spatial kernel.
+    conley_metric : str or callable, default="haversine"
+        Distance metric used for both ring construction and the Conley
+        spatial kernel. See :mod:`diff_diff.conley` for callable contract.
+    conley_cutoff_km : float, optional
+        Conley spatial-HAC bandwidth. Required when ``vcov_type="conley"``.
+    conley_lag_cutoff : int, optional
+        Within-unit Bartlett max lag. Required when ``vcov_type="conley"``.
+        Use ``0`` to suppress the serial-component sandwich.
+    cluster : str, optional
+        Column name for cluster-robust variance, or the combined Conley
+        cluster product kernel when paired with ``vcov_type="conley"``.
+    alpha : float, default=0.05
+        Significance level for confidence intervals.
+    anticipation : int, default=0
+        Number of pre-treatment periods where effects may occur. Treatment
+        and ring-membership clocks both shift by ``-anticipation`` so the
+        stage-1 untreated-and-unexposed subsample correctly excludes
+        anticipation rows.
+    event_study : bool, default=False
+        If ``True``, emit per-event-time × ring coefficients (Butts Table
+        2 staggered specification). The result's ``spillover_effects``
+        DataFrame uses a ``MultiIndex`` over ``(ring, event_time)``.
+    horizon_max : int, optional
+        Maximum absolute event-study horizon. Mirrors
+        :class:`diff_diff.two_stage.TwoStageDiD`.
+    rank_deficient_action : {"warn", "error", "silent"}, default="warn"
+        Action when the stage-2 design is rank-deficient.
+
+    Attributes
+    ----------
+    results_ : SpilloverDiDResults
+        Populated after :meth:`fit` completes.
+    is_fitted_ : bool
+
+    Notes
+    -----
+    The implementation uses two-stage Gardner methodology with the
+    time-varying ``S_it = S_i * 1{t >= t_treat}`` form (paper page 12,
+    just above Equation 5). Reading the literal unit-static ``(1 - D_it) *
+    S_i`` from Equation 5 yields a rank-deficient design under TWFE;
+    Section 5's Table 2 makes the time-varying form explicit. The
+    diff-diff implementation matches the paper's identification argument
+    once the ``S_it`` notation is read correctly.
+
+    For non-staggered timing, Gardner identity → stage-2 point estimates
+    equal a single-stage TWFE with the time-varying spillover regressor.
+    """
+
+    def __init__(
+        self,
+        *,
+        rings: List[float],
+        d_bar: Optional[float] = None,
+        vcov_type: str = "hc1",
+        conley_coords: Optional[Tuple[str, str]] = None,
+        conley_metric: SpilloverMetric = "haversine",
+        conley_cutoff_km: Optional[float] = None,
+        conley_lag_cutoff: Optional[int] = None,
+        cluster: Optional[str] = None,
+        alpha: float = 0.05,
+        anticipation: int = 0,
+        event_study: bool = False,
+        horizon_max: Optional[int] = None,
+        rank_deficient_action: str = "warn",
+    ):
+        if rank_deficient_action not in ("warn", "error", "silent"):
+            raise ValueError(
+                f"rank_deficient_action must be 'warn', 'error', or 'silent', "
+                f"got '{rank_deficient_action}'"
+            )
+        self.rings = rings
+        self.d_bar = d_bar
+        self.vcov_type = vcov_type
+        self.conley_coords = conley_coords
+        self.conley_metric = conley_metric
+        self.conley_cutoff_km = conley_cutoff_km
+        self.conley_lag_cutoff = conley_lag_cutoff
+        self.cluster = cluster
+        self.alpha = alpha
+        self.anticipation = anticipation
+        self.event_study = event_study
+        self.horizon_max = horizon_max
+        self.rank_deficient_action = rank_deficient_action
+        self.is_fitted_ = False
+        self.results_: Optional[Any] = None
+
+    def get_params(self) -> Dict[str, Any]:
+        return {
+            "rings": self.rings,
+            "d_bar": self.d_bar,
+            "vcov_type": self.vcov_type,
+            "conley_coords": self.conley_coords,
+            "conley_metric": self.conley_metric,
+            "conley_cutoff_km": self.conley_cutoff_km,
+            "conley_lag_cutoff": self.conley_lag_cutoff,
+            "cluster": self.cluster,
+            "alpha": self.alpha,
+            "anticipation": self.anticipation,
+            "event_study": self.event_study,
+            "horizon_max": self.horizon_max,
+            "rank_deficient_action": self.rank_deficient_action,
+        }
+
+    def set_params(self, **params: Any) -> "SpilloverDiD":
+        valid = set(self.get_params().keys())
+        for key, value in params.items():
+            if key not in valid:
+                raise ValueError(
+                    f"Unknown parameter: {key!r}. Valid parameters: " f"{sorted(valid)}."
+                )
+            setattr(self, key, value)
+        return self
+
+    # -------------------------------------------------------------------------
+    # Fit-time validators (Step 2)
+    # -------------------------------------------------------------------------
+
+    def _validate_spillover_inputs(
+        self,
+        data: pd.DataFrame,
+        treatment: Optional[str],
+        first_treat: Optional[str],
+        time: str,
+        unit: str,
+        outcome: str,
+    ) -> None:
+        """Front-door validation for SpilloverDiD.fit().
+
+        Runs BEFORE any stage-1 work. Catches malformed estimator state
+        (rings, d_bar), missing/conflicting timing kwargs (treatment XOR
+        first_treat), missing required columns, and Conley-specific
+        prerequisites. Resolves ``self._effective_d_bar`` as a side
+        effect so subsequent helpers can read it directly.
+
+        Raises
+        ------
+        ValueError
+            Any malformed input. Error messages name the offending kwarg
+            and (where applicable) the offending row count.
+        """
+        # 1. rings: sorted list of >= 2 elements, non-negative, strictly increasing.
+        if not isinstance(self.rings, (list, tuple, np.ndarray)):
+            raise ValueError(
+                f"rings must be a list/tuple/array of distance breakpoints; "
+                f"got {type(self.rings).__name__}."
+            )
+        rings_arr = np.asarray(self.rings, dtype=np.float64)
+        if rings_arr.ndim != 1 or rings_arr.size < 2:
+            raise ValueError(
+                "rings must contain at least 2 breakpoints; "
+                f"got {len(self.rings)} ({list(self.rings)})."
+            )
+        if (rings_arr < 0).any():
+            raise ValueError(f"rings must be non-negative; got {list(self.rings)}.")
+        if (np.diff(rings_arr) <= 0).any():
+            raise ValueError(f"rings must be strictly increasing; got {list(self.rings)}.")
+        if rings_arr[0] != 0:
+            raise ValueError(
+                f"rings[0] must equal 0 to cover treated locations "
+                f"(d_it = 0 must belong to Ring 1); got rings[0] = "
+                f"{rings_arr[0]}. Rows with 0 <= d_it < rings[0] would "
+                "be flagged as exposed (S_it = 1) but receive zero "
+                "spillover regressors at stage 2, silently biasing the "
+                "estimator. To exclude very-close pairs, model that with "
+                "an explicit innermost ring covering [0, rings[0])."
+            )
+
+        # 2. d_bar: defaults to rings[-1]; if set explicitly must equal rings[-1]
+        #    (avoid the dead zone where d_i in (rings[-1], d_bar] is neither
+        #    in any ring nor far-away).
+        if self.d_bar is None:
+            self._effective_d_bar = float(rings_arr[-1])
+        else:
+            if not np.isfinite(self.d_bar) or self.d_bar <= 0:
+                raise ValueError(f"d_bar must be positive and finite; got {self.d_bar}.")
+            if not np.isclose(self.d_bar, rings_arr[-1]):
+                raise ValueError(
+                    f"d_bar ({self.d_bar}) must equal max(rings) ({rings_arr[-1]}); "
+                    "to vary d_bar, vary the rings breakpoints (the outermost "
+                    "edge is implicitly the spillover cutoff). Setting d_bar "
+                    "different from rings[-1] would create a 'dead zone' "
+                    "where units in (rings[-1], d_bar] are neither in any "
+                    "ring nor in the far-away control group."
+                )
+            self._effective_d_bar = float(self.d_bar)
+
+        # 3. Exactly ONE of treatment / first_treat must be supplied.
+        if treatment is None and first_treat is None:
+            raise ValueError(
+                "Exactly one of `treatment` (binary D_it column) or "
+                "`first_treat` (per-unit onset-time column) must be supplied."
+            )
+        if treatment is not None and first_treat is not None:
+            raise ValueError(
+                "Provide either `treatment` or `first_treat`, not both. "
+                "`treatment` is auto-converted to `first_treat` internally."
+            )
+
+        # 4. Required columns exist in data (treat outcome the same way as
+        # other required columns — front-door error rather than late
+        # KeyError when `data[outcome]` is dereferenced).
+        required = [time, unit, outcome]
+        if treatment is not None:
+            required.append(treatment)
+        if first_treat is not None:
+            required.append(first_treat)
+        missing = [c for c in required if c not in data.columns]
+        if missing:
+            raise ValueError(f"Missing required columns in data: {missing}.")
+
+        # 4a-bis. Outcome must be finite per-row. Non-finite outcomes
+        # propagate into stage-1 FE estimation and surface as non-
+        # convergence warnings + late solver failures rather than a
+        # targeted input error. Reject up front.
+        outcome_arr = np.asarray(data[outcome].values, dtype=np.float64)
+        if not np.isfinite(outcome_arr).all():
+            n_bad = int((~np.isfinite(outcome_arr)).sum())
+            raise ValueError(
+                f"outcome column '{outcome}' contains {n_bad} non-finite "
+                "value(s) (NaN / Inf). SpilloverDiD requires finite outcomes "
+                "for stage-1 FE estimation; impute or drop missing rows "
+                "before fitting."
+            )
+
+        # 4a-ter. Identifier columns (unit, time, optionally first_treat
+        # when user-supplied) must not contain NaN. Missing identifiers
+        # would fall through to opaque numpy / pandas errors (e.g.
+        # "negative elements" from np.bincount) rather than a targeted
+        # ValueError. Reject up front.
+        for id_col in (unit, time):
+            id_nan_mask = data[id_col].isna()
+            if bool(id_nan_mask.any()):
+                n_nan = int(id_nan_mask.sum())
+                raise ValueError(
+                    f"identifier column '{id_col}' contains {n_nan} "
+                    "missing value(s). SpilloverDiD requires valid "
+                    "unit / time identifiers on every row; drop or "
+                    "impute missing-identifier rows before fitting."
+                )
+        # `first_treat` is checked only when user-supplied; the auto-
+        # generated path produces a clean column.
+        if first_treat is not None and first_treat in data.columns:
+            ft_nan_mask = data[first_treat].isna()
+            if bool(ft_nan_mask.any()):
+                n_nan = int(ft_nan_mask.sum())
+                raise ValueError(
+                    f"first_treat column '{first_treat}' contains {n_nan} "
+                    "missing value(s). Use np.inf (or 0) for never-treated "
+                    "units; do not leave NaN."
+                )
+
+        # 4b. One-row-per-(unit, time) cell panel contract. Duplicate cells
+        # would silently re-weight stage-1 FE estimation AND stage-2 OLS
+        # without any warning. Reject up front.
+        cell_counts = data.groupby([unit, time]).size()
+        dups = cell_counts[cell_counts > 1]
+        if len(dups) > 0:
+            sample = list(dups.index[:5])
+            suffix = f" (and {len(dups) - 5} more)" if len(dups) > 5 else ""
+            raise ValueError(
+                f"{len(dups)} duplicate (unit, time) cell(s) detected "
+                f"(e.g. {sample}{suffix}). SpilloverDiD requires "
+                "one-row-per-(unit, time) panel data — duplicate cells "
+                "would silently re-weight both the stage-1 FE fit and the "
+                "stage-2 OLS. Aggregate to unique cells before fitting."
+            )
+
+        # 4c. Balanced-panel contract for the Wave B MVP. An unbalanced
+        # panel where the stage-1 (unit, time) FE bipartite graph induced
+        # by Omega_0 isn't connected produces unidentified residuals on
+        # treated rows. The exact-graph-connectivity check is queued as
+        # a follow-up; the MVP simply rejects panels where some unit
+        # doesn't observe every period.
+        n_unique_times = data[time].nunique()
+        unit_period_counts = data.groupby(unit)[time].nunique()
+        underbalanced = unit_period_counts[unit_period_counts < n_unique_times]
+        if len(underbalanced) > 0:
+            sample = list(underbalanced.index[:5])
+            suffix = f" (and {len(underbalanced) - 5} more)" if len(underbalanced) > 5 else ""
+            raise ValueError(
+                f"Unbalanced panel: {len(underbalanced)} unit(s) do not "
+                f"observe every period (panel has {n_unique_times} unique "
+                f"periods, affected units e.g. {sample}{suffix}). Wave B "
+                "MVP requires a balanced panel — an unbalanced (unit, time) "
+                "Omega_0 bipartite graph can produce unidentified residuals "
+                "for some treated rows even when every unit and every "
+                "period has at least one Omega_0 row. Balance the panel "
+                "(impute missing cells or drop affected units) before "
+                "fitting. Graph-connectivity-based identification is "
+                "queued as a follow-up extension."
+            )
+
+        # 5a. conley_coords is required ALWAYS — ring construction dereferences
+        # it on every fit() path, regardless of vcov_type. Validate up front
+        # rather than letting downstream code fail with AssertionError/KeyError.
+        if self.conley_coords is None:
+            raise ValueError(
+                "SpilloverDiD requires `conley_coords=(lat_col, lon_col)` "
+                "for ring construction, regardless of vcov_type."
+            )
+        if not isinstance(self.conley_coords, (list, tuple)) or len(self.conley_coords) != 2:
+            raise ValueError(
+                "conley_coords must be a 2-tuple (lat_col, lon_col); "
+                f"got {self.conley_coords!r}."
+            )
+        # Within-unit coord constancy: ring construction collapses coords to
+        # one row per unit via drop_duplicates(subset=[unit]). If a unit's
+        # lat/lon varies across rows the first observed value is silently
+        # used; reject up front rather than silently misclassify spillover
+        # exposure.
+        coord_cols = list(self.conley_coords)
+        if unit in data.columns and all(c in data.columns for c in coord_cols):
+            per_unit_unique = data.groupby(unit)[coord_cols].nunique()
+            non_constant = per_unit_unique[(per_unit_unique > 1).any(axis=1)]
+            if len(non_constant) > 0:
+                sample = non_constant.index.tolist()[:5]
+                suffix = f" (and {len(non_constant) - 5} more)" if len(non_constant) > 5 else ""
+                raise ValueError(
+                    f"{len(non_constant)} unit(s) have non-constant "
+                    f"conley_coords ({coord_cols}) across rows (e.g. {sample}"
+                    f"{suffix}). SpilloverDiD requires within-unit-constant "
+                    "coordinates — ring construction collapses coords per "
+                    "unit via drop_duplicates. Aggregate to a single (lat, "
+                    "lon) per unit (e.g. via the unit's geographic centroid) "
+                    "before fitting, or fix the data so coords are constant."
+                )
+        for c in self.conley_coords:
+            if c not in data.columns:
+                raise ValueError(f"conley_coords column '{c}' not in data.")
+        # Coord finiteness check (per-row).
+        coord_vals = data[list(self.conley_coords)].values
+        coord_arr = np.asarray(coord_vals, dtype=np.float64)
+        if not np.isfinite(coord_arr).all():
+            n_nonfinite = int((~np.isfinite(coord_arr)).any(axis=1).sum())
+            raise ValueError(
+                f"conley_coords contain non-finite values in {n_nonfinite} row(s); "
+                "coordinates must be finite for distance computation."
+            )
+        # Haversine lat/lon domain check: applies on EVERY vcov path (not just
+        # vcov_type='conley') because ring construction always uses
+        # conley_metric for distance computation. Out-of-range coords silently
+        # produce wrong ring assignment otherwise.
+        if self.conley_metric == "haversine":
+            lat_arr = coord_arr[:, 0]
+            lon_arr = coord_arr[:, 1]
+            if (lat_arr < -90.0).any() or (lat_arr > 90.0).any():
+                bad_rows = int(((lat_arr < -90.0) | (lat_arr > 90.0)).sum())
+                raise ValueError(
+                    f"conley_coords latitude column '{coord_cols[0]}' contains "
+                    f"{bad_rows} row(s) outside [-90, 90] degrees. Haversine "
+                    "metric requires geographic lat/lon coords; if your coords "
+                    "are already projected (planar), pass conley_metric='euclidean'."
+                )
+            if (lon_arr < -180.0).any() or (lon_arr > 180.0).any():
+                bad_rows = int(((lon_arr < -180.0) | (lon_arr > 180.0)).sum())
+                raise ValueError(
+                    f"conley_coords longitude column '{coord_cols[1]}' contains "
+                    f"{bad_rows} row(s) outside [-180, 180] degrees. Haversine "
+                    "metric requires geographic lat/lon coords; if your coords "
+                    "are already projected (planar), pass conley_metric='euclidean'."
+                )
+
+        # 5b. cluster column existence + NaN check — applies on every vcov
+        # path, not just conley. Missing cluster ids would produce wrong
+        # SEs (NaN counted as its own cluster by np.unique() but dropped
+        # by pandas groupby() in the cluster meat).
+        if self.cluster is not None:
+            if self.cluster not in data.columns:
+                raise ValueError(f"cluster column '{self.cluster}' not in data.")
+            cluster_nan_mask = data[self.cluster].isna()
+            if bool(cluster_nan_mask.any()):
+                n_nan = int(cluster_nan_mask.sum())
+                raise ValueError(
+                    f"cluster column '{self.cluster}' contains {n_nan} "
+                    "missing value(s). NaN cluster ids would silently "
+                    "produce wrong clustered SEs (np.unique counts NaN as "
+                    "its own cluster but pandas groupby drops it from the "
+                    "cluster meat). Drop or impute missing cluster rows "
+                    "before fitting."
+                )
+
+        # 5c. Conley-specific kwargs (only required when vcov_type='conley').
+        if self.vcov_type == "conley":
+            if self.conley_cutoff_km is None or not (
+                np.isfinite(self.conley_cutoff_km) and self.conley_cutoff_km > 0
+            ):
+                raise ValueError(
+                    "vcov_type='conley' requires conley_cutoff_km > 0 (finite); "
+                    f"got {self.conley_cutoff_km}."
+                )
+            if self.conley_lag_cutoff is None or self.conley_lag_cutoff < 0:
+                raise ValueError(
+                    "vcov_type='conley' requires conley_lag_cutoff >= 0 (integer); "
+                    f"got {self.conley_lag_cutoff}."
+                )
+
+        # 6. At least one treated unit must exist.
+        if treatment is not None:
+            n_treated_obs = int((data[treatment] == 1).sum())
+            if n_treated_obs == 0:
+                raise ValueError(
+                    f"No treated observations found (column '{treatment}' "
+                    "is all 0/NaN). SpilloverDiD requires at least one treated unit."
+                )
+        else:
+            ft_finite = np.isfinite(data[first_treat].astype(float).values)  # type: ignore[arg-type]
+            n_treated_units = int(
+                pd.Series(ft_finite & (data[first_treat].astype(float).values != 0)).any()  # type: ignore[index]
+            )
+            if not n_treated_units:
+                raise ValueError(
+                    f"No treated units found (column '{first_treat}' is "
+                    "all 0 / inf / NaN). SpilloverDiD requires at least one "
+                    "treated unit."
+                )
+
+    def _validate_far_away_exists(
+        self,
+        d_array: np.ndarray,
+        is_control_array: np.ndarray,
+    ) -> int:
+        """Verify Butts Assumption 5(ii): at least one (D=0, d > d_bar) observation.
+
+        Parameters
+        ----------
+        d_array : ndarray
+            Per-unit or per-row distances (caller chooses; the check is
+            count-based, not granularity-sensitive).
+        is_control_array : ndarray, bool
+            Aligned mask: True where the observation belongs to a control
+            unit (D_i = 0 for static, D_it = 0 for staggered).
+
+        Returns
+        -------
+        n_far_away : int
+            Number of far-away control observations.
+
+        Raises
+        ------
+        ValueError
+            No (D=0, d > d_bar) observations exist; Assumption 5(ii) fails.
+        """
+        d_bar = self._effective_d_bar
+        far_away_mask = (d_array > d_bar) & is_control_array
+        n_far_away = int(far_away_mask.sum())
+        if n_far_away < 1:
+            raise ValueError(
+                "No far-away control observations: every control unit has "
+                f"d_i <= d_bar = {d_bar}. Butts (2021) Assumption 5(ii) "
+                "requires the sample to contain control units strictly "
+                "further than d_bar from any treated unit. Either reduce "
+                "d_bar (via the outermost ring breakpoint), expand the sample, "
+                "or verify the coords/metric configuration."
+            )
+        return n_far_away
+
+    def fit(
+        self,
+        data: pd.DataFrame,
+        *,
+        outcome: str,
+        unit: str,
+        time: str,
+        treatment: Optional[str] = None,
+        first_treat: Optional[str] = None,
+        covariates: Optional[List[str]] = None,
+        survey_design: object = None,
+    ) -> SpilloverDiDResults:
+        """Fit the two-stage Gardner DiD with ring-indicator covariates.
+
+        Methodology (Butts 2021 Section 5 + Gardner 2022):
+
+        1. Compute per-row spillover indicators from ``conley_coords``.
+        2. Build stage-1 subsample ``Omega_0 = {D_it=0 AND S_it=0}``
+           (untreated AND unexposed) — Butts' clean control group.
+        3. Stage 1: fit ``Y_it = mu_i + lambda_t + u`` on ``Omega_0``.
+        4. Residualize: ``Y_tilde = Y - mu_hat - lambda_hat`` for ALL rows.
+        5. Stage 2: regress ``Y_tilde`` on ``[D_it, (1-D_it)*Ring_{it,j}]`` via
+           :func:`solve_ols`, threading the configured ``vcov_type``.
+        6. Wrap as :class:`SpilloverDiDResults`.
+
+        Notes
+        -----
+        Wave B MVP: stage-2 variance is the standard solve_ols estimator
+        (HC1 / Conley / cluster). The Gardner GMM sandwich first-stage
+        uncertainty correction is NOT applied (planned follow-up; see
+        TODO and plan Risks #2). Variance is therefore approximate (likely
+        underestimated by a few percent in typical settings).
+        """
+        if survey_design is not None:
+            raise NotImplementedError(
+                "SpilloverDiD does not yet support survey_design= ; planned "
+                "as a follow-up extension. See TODO.md."
+            )
+        if self.event_study:
+            raise NotImplementedError(
+                "SpilloverDiD does not yet support event_study=True ; the "
+                "per-event-time × ring decomposition (Butts Table 2) is "
+                "planned as a follow-up extension. The base "
+                "(event_study=False) aggregate spec is fully supported and "
+                "handles both non-staggered and staggered timing."
+            )
+        if self.horizon_max is not None:
+            raise NotImplementedError(
+                "SpilloverDiD does not yet support horizon_max= (used only "
+                "in event-study mode); planned as a follow-up extension."
+            )
+        # Validate `anticipation` up front: must be a non-negative integer.
+        # Accepting fractional or negative values would silently shift
+        # treatment timing and ring exposure beyond what the estimator's
+        # identification contract supports.
+        if not isinstance(self.anticipation, (int, np.integer)) or self.anticipation < 0:
+            raise ValueError(
+                f"anticipation must be a non-negative integer; got "
+                f"{self.anticipation!r} (type {type(self.anticipation).__name__})."
+            )
+        if covariates is not None and len(covariates) > 0:
+            raise NotImplementedError(
+                "SpilloverDiD does not yet support covariates= in Wave B MVP. "
+                "The Gardner-style two-stage pattern requires covariate "
+                "effects to be estimated on the untreated-and-unexposed "
+                "subsample at stage 1 and subtracted from Y before stage 2 — "
+                "appending them only at stage 2 (without stage-1 "
+                "residualization) would silently bias tau_total / delta_j on "
+                "panels with time-varying covariates. The full covariate "
+                "path mirroring TwoStageDiD._fit_untreated_model is queued as "
+                "a follow-up extension. See TODO.md."
+            )
+        if self.vcov_type in ("hc2", "hc2_bm"):
+            raise NotImplementedError(
+                f"SpilloverDiD does not yet support vcov_type='{self.vcov_type}'. "
+                "The current stage-2 inference uses a generic residual df "
+                "(n - effective_rank) for t-distribution lookups, but "
+                "hc2 / hc2_bm require per-coefficient Bell-McCaffrey / CR2 "
+                "degrees of freedom for correct p-values and CIs. Routing "
+                "stage 2 through LinearRegression (which supplies the "
+                "per-coefficient DOF metadata) is queued as a follow-up "
+                "extension. Use vcov_type='hc1', 'classical', 'conley', or "
+                "leave default; combine with cluster=<col> for CR1."
+            )
+
+        # Step 0: defensive copy so the caller's DataFrame is never mutated.
+        data = data.copy(deep=False)
+
+        # Step 0b: coerce `time` to numeric BEFORE any structural validation.
+        # The validator's duplicate-cell and balanced-panel checks depend on
+        # period IDENTITY; mixed raw encodings like ['0', 0, '1', 1] would
+        # pass validation but collapse to duplicate periods after coercion.
+        # Coercing first ensures validation sees the actual numeric labels.
+        if time in data.columns:
+            try:
+                data = data.assign(**{time: pd.to_numeric(data[time])})
+            except (TypeError, ValueError) as exc:
+                raise ValueError(
+                    f"time column '{time}' must be numeric (or string-coercible "
+                    f"to numeric). Got: {exc}. Encode periods as integers / "
+                    "floats before passing to SpilloverDiD."
+                ) from exc
+        # User-supplied first_treat must also be coerced BEFORE validation
+        # so the NaN check and identity-based checks see the actual labels.
+        # Auto-generated `_spillover_first_treat` (from binary D) doesn't
+        # exist yet — it's created later by `_convert_treatment_to_first_treat`.
+        if first_treat is not None and first_treat in data.columns:
+            try:
+                data = data.assign(**{first_treat: pd.to_numeric(data[first_treat])})
+            except (TypeError, ValueError) as exc:
+                raise ValueError(
+                    f"first_treat column '{first_treat}' must be numeric (or "
+                    f"string-coercible to numeric). Got: {exc}. Encode onset "
+                    "times as integers / floats (or np.inf for never-treated) "
+                    "before passing to SpilloverDiD."
+                ) from exc
+
+        # Step 1: front-door validation (rings, d_bar, timing-kwargs XOR,
+        # coords, panel structure — all on COERCED time/first_treat labels).
+        self._validate_spillover_inputs(data, treatment, first_treat, time, unit, outcome)
+
+        # Step 2: convert binary treatment to per-unit first_treat if needed.
+        # Track whether `first_treat` was AUTO-GENERATED (from a binary D
+        # column) vs USER-SUPPLIED (Gardner convention). The auto-generated
+        # column uses ONLY np.inf for never-treated (no 0-as-never-treated
+        # sentinel); preserving this distinction avoids silently
+        # reclassifying baseline-treated units (D=1 at t=0) as never-treated.
+        treatment_auto_converted = treatment is not None
+        if treatment is not None:
+            data, first_treat = _convert_treatment_to_first_treat(data, treatment, time, unit)
+        assert first_treat is not None  # validator guarantees this
+
+        # Step 3: factorize unit/time → integer codes (mirrors TwoStageDiD).
+        unit_vals = data[unit].values
+        time_vals = data[time].values
+        unit_codes_full, unit_uniques = pd.factorize(pd.Series(unit_vals), sort=True)
+        time_codes_full, time_uniques = pd.factorize(pd.Series(time_vals), sort=True)
+
+        # Step 4: extract treatment onsets per unit; detect staggered.
+        first_treat_by_unit = _extract_treatment_onsets(
+            data,
+            first_treat,
+            unit,
+            treat_zero_as_never_treated=not treatment_auto_converted,
+        )
+        finite_onsets = {ft for ft in first_treat_by_unit.values() if np.isfinite(ft)}
+        if not finite_onsets:
+            raise ValueError(
+                "No treated units found (all first_treat values are inf or 0). "
+                "SpilloverDiD requires at least one treated unit."
+            )
+        is_staggered = len(finite_onsets) > 1
+
+        # Apply anticipation shift to onsets used for ring construction AND
+        # for the D_it indicator (treatment-effective onset).
+        effective_onsets = {
+            uid: (ft - self.anticipation if np.isfinite(ft) else ft)
+            for uid, ft in first_treat_by_unit.items()
+        }
+
+        # Step 5: compute per-row d_it. For non-staggered (single common
+        # onset), use the cheaper static helper that builds the pairwise
+        # distance matrix once; for staggered, use the per-cohort helper
+        # that handles time-varying ring membership.
+        assert self.conley_coords is not None  # validator-guaranteed
+
+        # If conley_metric is a user callable, validate it against the full
+        # 6-check contract (shape / finite / non-negative / symmetric /
+        # zero-diagonal) on the per-unit (n, n) self-call BEFORE using it
+        # for ring construction. Without this, a callable with positive
+        # self-distance silently corrupts ring assignment (treated units
+        # at their own location should have d=0 → fall in Ring_1; positive
+        # self-distance pushes them out into a different ring).
+        if callable(self.conley_metric):
+            unit_coords_for_validation = (
+                data[list(self.conley_coords)].drop_duplicates().values.astype(np.float64)
+            )
+            _validate_callable_metric_result(
+                self.conley_metric(unit_coords_for_validation, unit_coords_for_validation),
+                unit_coords_for_validation.shape[0],
+            )
+
+        if is_staggered:
+            d_it_per_row, _, _ = _compute_nearest_treated_distance_staggered(
+                data,
+                unit=unit,
+                time=time,
+                coords=self.conley_coords,
+                metric=self.conley_metric,
+                first_treat_by_unit=effective_onsets,
+            )
+        else:
+            # Non-staggered: single common onset. Build d_i per unit once,
+            # then broadcast to per-row AND zero out pre-treatment rows
+            # (matching the staggered helper's inf-at-pre-treatment
+            # convention so downstream ring + Omega_0 logic is timing-
+            # agnostic).
+            ever_treated_ids = np.array(
+                [uid for uid, ft in first_treat_by_unit.items() if np.isfinite(ft)],
+                dtype=object,
+            )
+            d_i_per_unit, unit_index_static = _compute_nearest_treated_distance_static(
+                data,
+                unit=unit,
+                coords=self.conley_coords,
+                metric=self.conley_metric,
+                treated_unit_ids=ever_treated_ids,
+                # Pass `d_bar` as the cutoff so the cKDTree sparse path
+                # auto-activates when n_units > _CONLEY_SPARSE_N_THRESHOLD
+                # for built-in metrics. Units beyond d_bar get d_i = inf,
+                # which the downstream ring builder treats as far-away
+                # controls — same as the dense-path semantics.
+                cutoff_km=self._effective_d_bar,
+            )
+            unit_to_d = {uid: float(d_i_per_unit[idx]) for idx, uid in enumerate(unit_index_static)}
+            d_it_per_row = np.array([unit_to_d.get(u, np.inf) for u in unit_vals])
+            # Pre-treatment rows have d_it=inf (no unit treated yet).
+            shared_onset = next(iter(finite_onsets))
+            shared_effective_onset = shared_onset - self.anticipation
+            d_it_per_row = np.where(
+                np.asarray(time_vals, dtype=np.float64) < shared_effective_onset,
+                np.inf,
+                d_it_per_row,
+            )
+
+        # Step 6: build ring indicators per row (Butts Eq 6 time-varying form).
+        ring_masks = _build_ring_indicators(d_it_per_row, list(self.rings))
+        K = ring_masks.shape[1]
+
+        # Step 7: compute D_it per row (with anticipation shift).
+        D_it = np.zeros(len(data), dtype=np.float64)
+        for u_id, eff_ft in effective_onsets.items():
+            if np.isfinite(eff_ft):
+                rows = (unit_vals == u_id) & (np.asarray(time_vals) >= eff_ft)
+                D_it[rows] = 1.0
+
+        # Step 7b: verify at least one observation is treated AFTER applying
+        # the anticipation shift. If all first_treat values are > max(time)
+        # in the panel (e.g. an "anticipation" of treatment that hasn't
+        # arrived yet), D_it is all zeros and the stage-2 design has no
+        # treatment variation. Fail fast with a clear identification error
+        # rather than crashing inside solve_ols.
+        if D_it.sum() == 0:
+            max_time = float(np.max(np.asarray(time_vals, dtype=np.float64)))
+            raise ValueError(
+                "No observation is treated in-sample after applying "
+                f"anticipation shift of {self.anticipation}. The earliest "
+                "effective onset is later than the latest observed period "
+                f"({max_time}), so D_it = 0 everywhere and tau_total is "
+                "unidentified. Either include post-onset periods in the "
+                "panel, reduce the anticipation lead, or verify the "
+                "first_treat column."
+            )
+
+        # Step 8: compute S_it = 1{d_it <= d_bar}. Treated-self rows have
+        # d_it=0 → S_it=1 (Omega_0 excludes them; they're treated anyway).
+        S_it = (d_it_per_row <= self._effective_d_bar).astype(np.float64)
+
+        # Step 9: validate far-away controls (Butts Assumption 5(ii)).
+        # Use CURRENT-period untreated status, not never-treated-only. The
+        # paper defines Omega_0 row-wise as {D_it = 0 AND S_it = 0}, so
+        # not-yet-treated observations of eventually-treated units can also
+        # contribute to the far-away identifying group. This matters for
+        # all-eventually-treated staggered designs (no never-treated units).
+        is_control_row_now = D_it == 0
+        n_far_away_obs = self._validate_far_away_exists(d_it_per_row, is_control_row_now)
+
+        # Step 10: Butts Omega_0 mask = (D_it=0 AND S_it=0).
+        omega_0_mask = (D_it == 0) & (S_it == 0)
+
+        # Step 10b: row-level Omega_0 identification check.
+        #
+        # Two regimes (round-16 codex review split):
+        #   - PERIOD-level unsupported (no Omega_0 row at some t): time FE
+        #     structurally unidentified. Dropping the period would remove
+        #     ALL units' observations at that t, including the far-away
+        #     rows needed for identification. Hard error.
+        #   - UNIT-level unsupported (no Omega_0 row for some i): warn-
+        #     and-drop. Unit FE for that i is NaN, residualization writes
+        #     NaN on those rows, and the downstream finite_mask path at
+        #     Step 14 excludes them from stage 2. Mirrors `TwoStageDiD`'s
+        #     always-treated unit handling (`two_stage.py:294-336`) and
+        #     Gardner's framework, which identifies effects from supported
+        #     observations rather than requiring every unit estimable.
+        unit_codes_arr = np.asarray(unit_codes_full)
+        time_codes_arr = np.asarray(time_codes_full)
+        units_in_omega_0 = set(unit_codes_arr[omega_0_mask].tolist())
+        times_in_omega_0 = set(time_codes_arr[omega_0_mask].tolist())
+        all_unit_codes = set(unit_codes_arr.tolist())
+        all_time_codes = set(time_codes_arr.tolist())
+        unsupported_units = sorted(all_unit_codes - units_in_omega_0)
+        unsupported_periods = sorted(all_time_codes - times_in_omega_0)
+
+        if unsupported_periods:
+            affected = [time_uniques[c] for c in unsupported_periods[:5]]
+            suffix = (
+                f" (and {len(unsupported_periods) - 5} more)"
+                if len(unsupported_periods) > 5
+                else ""
+            )
+            raise ValueError(
+                f"Stage-1 fixed effects unidentified: "
+                f"{len(unsupported_periods)} period(s) have NO untreated-and-"
+                f"unexposed (Omega_0) rows — their time FE is unidentified. "
+                f"Examples: {affected}{suffix}. The Butts subsample "
+                "Omega_0 = {D_it = 0 AND S_it = 0} must contain at least one "
+                "row per period that appears in the data. Consider "
+                "tightening d_bar (so fewer rows are flagged as exposed "
+                "S_it = 1) or expanding the sample to include never-treated "
+                "or pre-treatment observations for the affected periods."
+            )
+
+        if unsupported_units:
+            affected = [unit_uniques[c] for c in unsupported_units[:5]]
+            suffix = (
+                f" (and {len(unsupported_units) - 5} more)" if len(unsupported_units) > 5 else ""
+            )
+            warnings.warn(
+                f"SpilloverDiD: {len(unsupported_units)} unit(s) have NO "
+                f"untreated-and-unexposed (Omega_0) rows — their unit FE "
+                f"is unidentified and their rows will be excluded from "
+                f"stage 2 estimation. Examples: {affected}{suffix}. To "
+                f"include these units, expand the sample to provide pre-"
+                f"treatment or untreated observations for them, or tighten "
+                f"d_bar so fewer rows are flagged as exposed (S_it = 1).",
+                UserWarning,
+                stacklevel=2,
+            )
+
+        # Step 10c: connected-component check on the Omega_0 bipartite graph.
+        #
+        # Stage 1's iterative FE solver identifies (mu_i, lambda_t) only up
+        # to component-specific constants per connected component of the
+        # bipartite graph (supported units ↔ periods, edge = Omega_0 row).
+        # If the graph splits into K > 1 components, _residualize_butts then
+        # combines mu_i from one component with lambda_t from another,
+        # silently corrupting y_tilde and downstream tau_total / delta_j.
+        # Balanced panel + per-unit/per-period Omega_0 coverage is NECESSARY
+        # but not SUFFICIENT — connectivity is the load-bearing
+        # identification condition for stage 1.
+        _check_omega_0_connectivity(
+            omega_0_mask=omega_0_mask,
+            unit_codes_arr=unit_codes_arr,
+            time_codes_arr=time_codes_arr,
+            units_in_omega_0=units_in_omega_0,
+            n_times=len(time_uniques),
+            unit_uniques=unit_uniques,
+        )
+
+        # Step 11: stage 1 — fit FE on Omega_0.
+        y_full = np.asarray(data[outcome].values, dtype=np.float64)
+        unit_fe_arr, time_fe_arr, converged = _iterative_fe_subset(
+            y_full,
+            np.asarray(unit_codes_full),
+            np.asarray(time_codes_full),
+            omega_0_mask,
+        )
+        if not converged:
+            warnings.warn(
+                "SpilloverDiD stage-1 iterative FE solver did not converge "
+                f"within {_FE_ITER_MAX} iterations (tol={_FE_ITER_TOL}). "
+                "Results may be unreliable.",
+                UserWarning,
+                stacklevel=2,
+            )
+        stage1_n_obs = int(omega_0_mask.sum())
+
+        # Step 12: residualize ALL observations.
+        y_tilde = _residualize_butts(
+            y_full,
+            np.asarray(unit_codes_full),
+            np.asarray(time_codes_full),
+            unit_fe_arr,
+            time_fe_arr,
+        )
+
+        # Mask rank-deficient (NaN y_tilde) rows: rather than zero them out
+        # (which leaves them in the sample for HC1/CR1 n/(n-k) corrections),
+        # we SUBSET stage-2 arrays to the finite rows before solve_ols. This
+        # ensures the SE formulas use the actual estimation sample size.
+        finite_mask = np.isfinite(y_tilde)
+        n_nan = int((~finite_mask).sum())
+        if n_nan > 0:
+            warnings.warn(
+                f"SpilloverDiD: {n_nan} observation(s) excluded from stage 2 "
+                "due to rank-deficient stage-1 FE estimates (unit or period "
+                "absent from the untreated-and-unexposed subsample).",
+                UserWarning,
+                stacklevel=2,
+            )
+
+        # Step 13: build stage-2 design [D_it, (1-D_it)*Ring_{it,j}].
+        ring_covariates = np.zeros((len(data), K), dtype=np.float64)
+        for j in range(K):
+            ring_covariates[:, j] = (1.0 - D_it) * ring_masks[:, j].astype(np.float64)
+
+        X_2 = np.column_stack([D_it.reshape(-1, 1), ring_covariates])
+
+        ring_labels = [_ring_label(list(self.rings), j) for j in range(K)]
+        col_names_all = ["treatment"] + [f"_spillover_{lab}" for lab in ring_labels]
+
+        # Step 14: subset arrays to the estimation sample (finite y_tilde rows).
+        # Apply to design, outcome, cluster ids, AND the Conley spatial/temporal
+        # auxiliary arrays so the HC1/CR1/Conley sample-size adjustments use the
+        # correct n.
+        cluster_ids_full = (
+            np.asarray(data[self.cluster].values) if self.cluster is not None else None
+        )
+        if n_nan > 0:
+            X_2_fit = X_2[finite_mask]
+            y_tilde_fit = y_tilde[finite_mask]
+            cluster_ids_fit = (
+                cluster_ids_full[finite_mask] if cluster_ids_full is not None else None
+            )
+            time_vals_fit = np.asarray(time_vals)[finite_mask]
+            unit_vals_fit = np.asarray(unit_vals)[finite_mask]
+        else:
+            X_2_fit = X_2
+            y_tilde_fit = y_tilde
+            cluster_ids_fit = cluster_ids_full
+            time_vals_fit = np.asarray(time_vals)
+            unit_vals_fit = np.asarray(unit_vals)
+
+        # Step 15: stage-2 OLS with configured vcov via solve_ols.
+        solve_kwargs: Dict[str, Any] = {
+            "return_vcov": True,
+            "rank_deficient_action": self.rank_deficient_action,
+            "column_names": col_names_all,
+            "vcov_type": self.vcov_type,
+            "cluster_ids": cluster_ids_fit,
+        }
+        if self.vcov_type == "conley":
+            coord_array_full = np.asarray(data[list(self.conley_coords)].values, dtype=np.float64)
+            coord_array_fit = coord_array_full[finite_mask] if n_nan > 0 else coord_array_full
+            solve_kwargs.update(
+                {
+                    "conley_coords": coord_array_fit,
+                    "conley_cutoff_km": self.conley_cutoff_km,
+                    "conley_metric": self.conley_metric,
+                    "conley_time": time_vals_fit,
+                    "conley_unit": unit_vals_fit,
+                    "conley_lag_cutoff": self.conley_lag_cutoff,
+                }
+            )
+
+        coef, residuals, vcov = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs)  # type: ignore[misc]
+
+        # Step 16: extract coefficients and inference.
+        tau_total = float(coef[0])
+        # Degrees of freedom for t-distribution inference. Use the EFFECTIVE
+        # rank after solve_ols drops rank-deficient (NaN) coefficient
+        # columns, NOT the raw column count. On rank-deficient stage-2
+        # fits (e.g. an empty ring covariate dropped by solve_ols's QR),
+        # using raw `X_2_fit.shape[1]` would understate df_resid and
+        # silently inflate p-values and CI widths.
+        n_obs_eff = int(finite_mask.sum())
+        k_effective = int(np.isfinite(coef).sum())
+        df_resid = n_obs_eff - k_effective
+        if df_resid <= 0:
+            # Degenerate: no residual degrees of freedom. Force NaN
+            # inference by setting df_resid = 0 (safe_inference treats
+            # df = 0 as no usable degrees of freedom, returning NaN for
+            # t-stat / p-value / CI). Distinct from df_resid = None
+            # which would fall through to a normal-distribution
+            # approximation — misleading on a degenerate sample.
+            warnings.warn(
+                f"SpilloverDiD stage-2 residual df = {df_resid} (n_obs="
+                f"{n_obs_eff}, effective_rank={k_effective}). Inference "
+                "(t-stat, p-value, CI) will be NaN.",
+                UserWarning,
+                stacklevel=2,
+            )
+            df_resid = 0
+
+        # Clamp negative diagonals to 0 before sqrt: indefinite Conley or
+        # near-singular sandwich variances can produce numerically tiny
+        # negative values that would otherwise NaN the entire inference
+        # row. Matches the sibling-estimator convention
+        # (two_stage.py:1183, estimators.py:606, stacked_did.py:515).
+        tau_se = (
+            float(np.sqrt(max(vcov[0, 0], 0.0)))
+            if vcov is not None and np.isfinite(vcov[0, 0])
+            else float("nan")
+        )
+        tau_t, tau_p, tau_ci = safe_inference(tau_total, tau_se, alpha=self.alpha, df=df_resid)
+
+        # Per-ring inference.
+        ring_rows = []
+        for j in range(K):
+            idx = 1 + j  # 0 is treatment; rings follow.
+            coef_j = float(coef[idx])
+            se_j = (
+                float(np.sqrt(max(vcov[idx, idx], 0.0)))
+                if vcov is not None and np.isfinite(vcov[idx, idx])
+                else float("nan")
+            )
+            t_j, p_j, ci_j = safe_inference(coef_j, se_j, alpha=self.alpha, df=df_resid)
+            ring_rows.append(
+                {
+                    "ring": ring_labels[j],
+                    "coef": coef_j,
+                    "se": se_j,
+                    "t_stat": t_j,
+                    "p_value": p_j,
+                    "ci_low": ci_j[0],
+                    "ci_high": ci_j[1],
+                }
+            )
+        spillover_df = pd.DataFrame(ring_rows).set_index("ring") if ring_rows else None
+
+        # Step 16: counts for the result class.
+        n_units_ever_in_ring: Dict[str, int] = {}
+        for j in range(K):
+            in_ring_units = data.loc[ring_masks[:, j], unit].nunique()
+            n_units_ever_in_ring[ring_labels[j]] = int(in_ring_units)
+
+        # Step 17: assemble SpilloverDiDResults. n_obs / n_treated / n_control
+        # reflect the actual stage-2 estimation sample (after dropping NaN
+        # y_tilde rows), matching solve_ols's HC1/CR1 sample-size adjustments.
+        D_it_fit = D_it[finite_mask] if n_nan > 0 else D_it
+
+        # Populate coefficients dict with ALL stage-2 coefficients (treatment
+        # + K rings) keyed by their col_names_all labels so consumers can
+        # align names to vcov rows/cols. "ATT" is exposed as an alias for the
+        # treatment slot to match the sibling-estimator convention.
+        coefficients_full: Dict[str, float] = {}
+        for i, name in enumerate(col_names_all):
+            val = float(coef[i]) if np.isfinite(coef[i]) else float("nan")
+            coefficients_full[name] = val
+        coefficients_full["ATT"] = tau_total
+
+        result = SpilloverDiDResults(
+            att=tau_total,
+            se=tau_se,
+            t_stat=tau_t,
+            p_value=tau_p,
+            conf_int=tau_ci,
+            n_obs=n_obs_eff,
+            n_treated=int(D_it_fit.sum()),
+            n_control=int(len(D_it_fit) - D_it_fit.sum()),
+            alpha=self.alpha,
+            coefficients=coefficients_full,
+            vcov=vcov,
+            residuals=residuals,
+            r_squared=None,
+            inference_method="analytical",
+            n_bootstrap=None,
+            n_clusters=(
+                int(len(np.unique(cluster_ids_fit))) if cluster_ids_fit is not None else None
+            ),
+            vcov_type=self.vcov_type,
+            cluster_name=self.cluster,
+            conley_lag_cutoff=(self.conley_lag_cutoff if self.vcov_type == "conley" else None),
+            spillover_effects=spillover_df,
+            ring_breakpoints=list(self.rings),
+            d_bar=self._effective_d_bar,
+            n_units_ever_in_ring=n_units_ever_in_ring,
+            n_far_away_obs=int(n_far_away_obs),
+            is_staggered=is_staggered,
+            event_study=self.event_study,
+            stage1_n_obs=stage1_n_obs,
+            anticipation=self.anticipation,
+        )
+        self.results_ = result
+        self.is_fitted_ = True
+        return result
diff --git a/docs/api/_autosummary/diff_diff.SpilloverDiD.rst b/docs/api/_autosummary/diff_diff.SpilloverDiD.rst
new file mode 100644
index 00000000..53e9ae1d
--- /dev/null
+++ b/docs/api/_autosummary/diff_diff.SpilloverDiD.rst
@@ -0,0 +1,21 @@
+﻿diff\_diff.SpilloverDiD
+=======================
+
+.. currentmodule:: diff_diff
+
+.. autoclass:: SpilloverDiD
+   :no-members:
+
+
+   .. rubric:: Methods
+
+   .. autosummary::
+
+      ~SpilloverDiD.__init__
+      ~SpilloverDiD.fit
+      ~SpilloverDiD.get_params
+      ~SpilloverDiD.set_params
+
+
+
+
diff --git a/docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst b/docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst
new file mode 100644
index 00000000..905cfdc5
--- /dev/null
+++ b/docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst
@@ -0,0 +1,61 @@
+﻿diff\_diff.SpilloverDiDResults
+==============================
+
+.. currentmodule:: diff_diff
+
+.. autoclass:: SpilloverDiDResults
+   :no-members:
+
+
+   .. rubric:: Methods
+
+   .. autosummary::
+
+      ~SpilloverDiDResults.__init__
+      ~SpilloverDiDResults.print_summary
+      ~SpilloverDiDResults.summary
+      ~SpilloverDiDResults.to_dataframe
+      ~SpilloverDiDResults.to_dict
+
+
+
+
+   .. rubric:: Attributes
+
+   .. autosummary::
+
+      ~SpilloverDiDResults.alpha
+      ~SpilloverDiDResults.anticipation
+      ~SpilloverDiDResults.bootstrap_distribution
+      ~SpilloverDiDResults.cluster_name
+      ~SpilloverDiDResults.coef_var
+      ~SpilloverDiDResults.coefficients
+      ~SpilloverDiDResults.conley_lag_cutoff
+      ~SpilloverDiDResults.d_bar
+      ~SpilloverDiDResults.event_study
+      ~SpilloverDiDResults.fitted_values
+      ~SpilloverDiDResults.inference_method
+      ~SpilloverDiDResults.is_significant
+      ~SpilloverDiDResults.is_staggered
+      ~SpilloverDiDResults.n_bootstrap
+      ~SpilloverDiDResults.n_clusters
+      ~SpilloverDiDResults.n_far_away_obs
+      ~SpilloverDiDResults.n_units_ever_in_ring
+      ~SpilloverDiDResults.r_squared
+      ~SpilloverDiDResults.residuals
+      ~SpilloverDiDResults.ring_breakpoints
+      ~SpilloverDiDResults.significance_stars
+      ~SpilloverDiDResults.spillover_effects
+      ~SpilloverDiDResults.stage1_n_obs
+      ~SpilloverDiDResults.survey_metadata
+      ~SpilloverDiDResults.vcov
+      ~SpilloverDiDResults.vcov_type
+      ~SpilloverDiDResults.att
+      ~SpilloverDiDResults.se
+      ~SpilloverDiDResults.t_stat
+      ~SpilloverDiDResults.p_value
+      ~SpilloverDiDResults.conf_int
+      ~SpilloverDiDResults.n_obs
+      ~SpilloverDiDResults.n_treated
+      ~SpilloverDiDResults.n_control
+
diff --git a/docs/api/index.rst b/docs/api/index.rst
index bdb1f086..7ff6b19d 100644
--- a/docs/api/index.rst
+++ b/docs/api/index.rst
@@ -27,6 +27,7 @@ Core estimator classes for DiD analysis:
    diff_diff.HeterogeneousAdoptionDiD
    diff_diff.EfficientDiD
    diff_diff.TwoStageDiD
+   diff_diff.SpilloverDiD
    diff_diff.WooldridgeDiD
    diff_diff.BaconDecomposition
    diff_diff.StaggeredTripleDifference
@@ -64,6 +65,7 @@ Result containers returned by estimators:
    diff_diff.EDiDBootstrapResults
    diff_diff.TwoStageDiDResults
    diff_diff.TwoStageBootstrapResults
+   diff_diff.SpilloverDiDResults
    diff_diff.BaconDecompositionResults
    diff_diff.wooldridge_results.WooldridgeDiDResults
    diff_diff.Comparison2x2
@@ -311,6 +313,7 @@ Estimators
    had
    efficient_did
    two_stage
+   spillover
    wooldridge_etwfe
    bacon
 
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
new file mode 100644
index 00000000..bb50f7d6
--- /dev/null
+++ b/docs/api/spillover.rst
@@ -0,0 +1,234 @@
+Spillover-Aware DiD (Butts 2021)
+=================================
+
+Ring-indicator spillover-aware Difference-in-Differences estimator.
+
+This module implements the methodology from Butts, K. (2023, originally 2021),
+"Difference-in-Differences with Spatial Spillovers" (arXiv:2105.03737v3).
+The estimator augments two-stage Gardner (2022) DiD with ring-indicator
+covariates that identify, alongside the direct effect on treated units
+(``tau_total``), per-ring spillover effects on near-control units
+(``delta_j``).
+
+The standard DiD estimator is biased for ``tau_total`` by
+``-tau_spill(0)`` (paper Proposition 2.1 / Equations 1-2); the
+ring-augmented regression removes this bias. The "far-away" cutoff
+``d_bar`` defines the boundary beyond which spillovers vanish
+(Assumption 5).
+
+**When to use SpilloverDiD:**
+
+- Spatial-policy DiD settings where treatment may spill over onto nearby
+  control units (border-RD, place-based policy, neighborhood effects,
+  geographic policy boundaries)
+- When the canonical DiD coefficient is suspected to be attenuated or
+  inflated by spillover bias and a far-away control group exists
+- As a robustness check on TwoStageDiD / ImputationDiD when the
+  treatment has plausibly local geographic externalities
+
+**Reference:** Butts, K. (2023). Difference-in-Differences with Spatial
+Spillovers. *arXiv:2105.03737v3*. Gardner, J. (2022). Two-stage
+differences in differences. *arXiv:2207.05943*.
+
+.. module:: diff_diff.spillover
+
+SpilloverDiD
+------------
+
+Main estimator class.
+
+.. autoclass:: diff_diff.SpilloverDiD
+   :no-index:
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :inherited-members:
+
+   .. rubric:: Methods
+
+   .. autosummary::
+
+      ~SpilloverDiD.fit
+      ~SpilloverDiD.get_params
+      ~SpilloverDiD.set_params
+
+SpilloverDiDResults
+-------------------
+
+Results container with per-ring spillover-effect table.
+
+.. autoclass:: diff_diff.SpilloverDiDResults
+   :no-index:
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+   .. rubric:: Methods
+
+   .. autosummary::
+
+      ~SpilloverDiDResults.summary
+      ~SpilloverDiDResults.to_dict
+
+Example Usage
+-------------
+
+Non-staggered 2-period panel with synthetic spillovers::
+
+    from diff_diff import SpilloverDiD
+
+    est = SpilloverDiD(
+        rings=[0, 50, 100, 200],          # 3 rings: [0,50), [50,100), [100,200]
+        conley_coords=("lat", "lon"),
+        vcov_type="conley",
+        conley_cutoff_km=200.0,
+        conley_lag_cutoff=0,
+    )
+    result = est.fit(
+        data=df,
+        outcome="y",
+        unit="unit",
+        time="time",
+        treatment="D",                    # binary; auto-converts to first_treat
+    )
+    print(result.summary())
+    # Total effect on treated (Butts tau_total):
+    print(f"tau_total = {result.att:.4f}")
+    # Per-ring spillover-on-control:
+    print(result.spillover_effects)
+
+Staggered timing (Gardner-convention first_treat column)::
+
+    result = est.fit(
+        data=df,
+        outcome="y",
+        unit="unit",
+        time="time",
+        first_treat="first_treat",        # 0 / inf = never-treated
+    )
+
+Identification spec
+-------------------
+
+The stage-2 regressor for ring j is the time-varying form
+
+.. math::
+
+    (1 - D_{it}) \cdot \mathrm{Ring}_{it,j}
+
+where ``Ring_{it,j}`` is the indicator that unit *i*'s nearest currently-
+treated unit is in distance bin *j* at time *t*, and ``D_it`` is the
+treatment indicator (1 if unit *i* is treated by time *t*). For
+non-staggered timing, all treated units share one onset and ``Ring`` is
+unit-static post-treatment / zero pre-treatment. For staggered timing,
+``Ring_{it,j}`` is unit-time-varying (a unit's nearest treated neighbor
+changes as more cohorts enter treatment).
+
+Stage 1 fits unit + time fixed effects on Butts' subsample
+``Omega_0 = {D_it = 0 AND S_it = 0}`` (untreated AND unexposed) — the
+clean far-away control group. Reading the literal unit-static
+``(1 - D_it) * S_i`` from paper Equation 5 yields a rank-deficient
+design under TWFE; the diff-diff implementation uses the time-varying
+form (paper page 12's ``S_it`` notation, made explicit in Section 5
+Table 2).
+
+Estimator Comparison
+--------------------
+
+.. list-table:: SpilloverDiD vs. TwoStageDiD vs. TwoWayFixedEffects
+   :header-rows: 1
+   :widths: 25 25 25 25
+
+   * - Feature
+     - SpilloverDiD
+     - TwoStageDiD
+     - TwoWayFixedEffects
+   * - Spillover handling
+     - Identifies per-ring delta_j
+     - None (assumes SUTVA)
+     - None (assumes SUTVA)
+   * - Methodology
+     - Two-stage Gardner + ring covariates
+     - Two-stage Gardner
+     - Single-stage TWFE
+   * - Staggered timing
+     - Yes
+     - Yes
+     - Biased under heterogeneity
+   * - Stage-1 subsample
+     - Butts: ``D=0 AND S=0`` (untreated AND unexposed)
+     - ``D=0`` (untreated)
+     - N/A (single stage)
+   * - Conley spatial-HAC SE
+     - Yes (via solve_ols at stage 2)
+     - Not yet supported
+     - Yes
+   * - Cluster-robust SE
+     - Yes (HC1 + CR1 via solve_ols)
+     - Yes (GMM sandwich + clusters)
+     - Yes
+
+Wave B MVP limitations
+----------------------
+
+The current implementation has the following documented limitations,
+planned as follow-up enhancements:
+
+- **Gardner GMM first-stage correction at stage 2** — stage-2 variance
+  is the standard ``solve_ols`` HC1 / Conley / cluster estimator without
+  the influence-function adjustment for stage-1 FE estimation
+  uncertainty. The full GMM sandwich (Butts & Gardner 2022) is planned
+  as a follow-up; until then, reported SEs are biased downward
+  (underestimated by a few percent in typical settings) because they
+  omit the additional variance contribution from estimating the stage-1
+  fixed effects. Confidence intervals are correspondingly too narrow
+  and p-values too small. Users should treat reported significance
+  conservatively until the GMM correction lands.
+- **Event-study mode** — ``event_study=True`` raises
+  ``NotImplementedError``. The per-event-time × ring decomposition
+  (Butts Table 2 staggered specification) is queued for a follow-up PR.
+- **Survey-design integration** — ``survey_design=`` raises
+  ``NotImplementedError``.
+- **Count-of-treated-in-ring** — only the "nearest-treated ring"
+  specification is implemented. The "count" form re-introduces
+  functional-form dependence (paper Section 3.2 end) and is queued.
+- **Data-driven d_bar selection** — Butts (2021b) / Butts (2023) JUE
+  Insight propose cross-validation under stronger parallel-trends
+  assumptions. Not in this PR.
+- **HC2 / HC2_BM (Bell-McCaffrey / CR2) variance** — current stage-2
+  inference uses a generic residual-df (n - effective rank) for
+  t-distribution lookups. ``vcov_type="hc2"`` / ``"hc2_bm"`` require
+  per-coefficient BM / CR2 DOF and raise ``NotImplementedError``.
+  Routing stage 2 through ``LinearRegression`` (which supplies the
+  per-coefficient DOF metadata) is queued.
+- **Balanced panel required** — every unit must observe every period.
+  An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected
+  FE components and unidentified stage-1 residuals on treated rows.
+  Validator rejects unbalanced inputs.
+- **Omega_0 row-level identification (period strict, unit warn-drop, plus connectivity)** —
+  Every period must have at least one Omega_0 row (else time FE for
+  that period is structurally unidentified; hard ``ValueError``). Units
+  lacking Omega_0 rows (e.g. baseline-treated units with ``D_it = 1`` at
+  every observed ``t``) are warned-and-dropped: their unit FE is NaN
+  and the downstream finite-mask path excludes them from stage 2.
+  Additionally, the supported-units bipartite graph (units linked by
+  shared Omega_0 periods) must form a single connected component;
+  ``K > 1`` components raise ``ValueError`` because the FE solver would
+  return only component-specific constants and residualization would
+  silently mix them across components. Mirrors ``TwoStageDiD``'s
+  always-treated unit handling on the warn-drop axis.
+- **One row per (unit, time) cell required** — duplicate cells silently
+  re-weight both stage-1 FE estimation and stage-2 OLS. Validator
+  rejects duplicate cells; aggregate to unique cells before fitting.
+- **rings[0] must equal 0** — the partition must cover treated
+  locations (``d_it = 0`` belongs to Ring 1). Validator rejects rings
+  that start at a nonzero inner edge to prevent a silent
+  exposed-but-unmodeled population in ``0 <= d_it < rings[0]``.
+- **conley_coords must be constant within unit** — ring construction
+  uses a single (lat, lon) per unit (units are stationary point
+  locations). Validator rejects coordinates that vary across rows of
+  the same unit; aggregate to one location per unit (e.g.
+  ``df.groupby(unit)[["lat","lon"]].first()``) before fitting if your
+  raw data has moving coordinates. This requirement applies on every
+  fit path, not just ``vcov_type="conley"``, because the rings
+  themselves are always coordinate-derived.
diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml
index 20770069..09d8aa6d 100644
--- a/docs/doc-deps.yaml
+++ b/docs/doc-deps.yaml
@@ -268,6 +268,28 @@ sources:
       - path: docs/choosing_estimator.rst
         type: user_guide
 
+  # ── SpilloverDiD (spillover group) ──────────────────────────────────
+
+  diff_diff/spillover.py:
+    drift_risk: high
+    docs:
+      - path: docs/methodology/REGISTRY.md
+        section: "SpilloverDiD"
+        type: methodology
+      - path: docs/api/spillover.rst
+        type: api_reference
+      - path: README.md
+        section: "Estimators (one-line catalog entry)"
+        type: user_guide
+      - path: docs/references.rst
+        type: user_guide
+      - path: diff_diff/guides/llms-full.txt
+        section: "SpilloverDiD"
+        type: user_guide
+      - path: diff_diff/guides/llms.txt
+        section: "Estimators"
+        type: user_guide
+
   # ── EfficientDiD (efficient_did group) ──────────────────────────────
 
   diff_diff/efficient_did.py:
diff --git a/docs/index.rst b/docs/index.rst
index 86755820..5c500530 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -177,6 +177,8 @@ Supported Estimators
      - Borusyak, Jaravel & Spiess (2024) imputation estimator
    * - :class:`~diff_diff.TwoStageDiD`
      - Gardner (2022) two-stage residualized estimator
+   * - :class:`~diff_diff.SpilloverDiD`
+     - Butts (2021) ring-indicator spillover-aware DiD
    * - :class:`~diff_diff.SyntheticDiD`
      - Synthetic DiD combining DiD and synthetic control
    * - :class:`~diff_diff.StackedDiD`
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 099b8059..13995e77 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -25,6 +25,7 @@ This document provides the academic foundations and key implementation requireme
    - [StaggeredTripleDifference](#staggeredtripledifference)
    - [TROP](#trop)
    - [HeterogeneousAdoptionDiD](#heterogeneousadoptiondid)
+   - [SpilloverDiD](#spilloverdid)
 4. [Diagnostics & Sensitivity](#diagnostics--sensitivity)
    - [PlaceboTests](#placebotests)
    - [BaconDecomposition](#bacondecomposition)
@@ -2936,6 +2937,84 @@ should be a deliberate user choice.
 
 ---
 
+## SpilloverDiD
+
+**Primary source:** Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. *arXiv:2105.03737v3* (originally posted 2021). https://arxiv.org/abs/2105.03737
+
+**Secondary sources:**
+- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*. The stage-1+2 residualization pattern this estimator reuses.
+- Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. *Journal of Econometrics* 92(1), 1-45. The spatial-HAC variance estimator recommended by Butts page 13 for inference with cutoff = `d_bar`.
+
+**Scope:** Spillover-aware Difference-in-Differences. Augments two-stage Gardner DiD with ring-indicator covariates that identify the direct effect on treated units (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Handles both panel non-staggered timing and Section 5 staggered timing in a single estimator; non-staggered is the special case where all treated units share an onset time.
+
+**Note:** This estimator is a **documented synthesis** of ingredients — no single published software package implements the exact recipe. `did2s` (R/Stata, Butts & Gardner 2022) implements the Gardner two-stage residualization but does NOT support ring covariates. The Butts (2021/2023) paper proposes the ring estimator in Equations 5/6/8 (non-staggered) and Section 5 / Table 2 (staggered) but does not ship reference software. The diff-diff implementation combines: (a) Butts (2021) Section 5 / Table 2 identification, (b) Gardner (2022) two-stage residualize-then-fit, (c) Wave A's Conley spatial-HAC vcov.
+
+**Identification spec (committed):**
+
+The stage-2 regressor for ring `j` is the **time-varying** form
+
+```
+(1 - D_it) * Ring_{it,j}
+```
+
+where `D_it` is the treatment indicator (1 if unit `i` is treated by time `t`) and `Ring_{it,j}` is the indicator that unit `i`'s nearest currently-treated unit (treated by time `t`) is in distance bin `j`. For non-staggered timing, `Ring_{it,j}` is unit-static post-treatment and zero pre-treatment (no unit treated yet); for staggered timing, `Ring_{it,j}` is unit-time-varying.
+
+**Note:** Reading the literal unit-static `(1 - D_it) * S_i` from paper Equation 5 yields a **rank-deficient design** under TWFE: `(1 - D_it) * S_i = S_i - D_it * S_i = S_i - D_it` (since `D_it = 1 ⇒ S_i = 1`), and `S_i` is absorbed by the unit FE `mu_i`, leaving `-D_it`. The two regressors are perfectly anti-collinear after FE absorption. The paper's identification argument (Proposition 2.3, Section 3.1 subsample language) only resolves when `S_it = S_i * 1{t >= first_treat}` is read as the **time-varying** form — which the paper defines on page 12 (the line directly above Equation 5) and which Section 5 Table 2 makes explicit (`S^k_{it}`, `Ring^k_{it,j}`). The implementation matches the paper's intent once the `S_it` notation is interpreted as time-varying.
+
+**Note:** Stage-1 fits unit + time FE on Butts' STRICTER subsample `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed) — the clean far-away control group. This differs from `TwoStageDiD`'s `Omega_0 = {D_it = 0}` (untreated; includes near-controls in post-treatment periods). The stricter Butts subsample prevents spillover-contaminated near-controls from biasing the time FE; near-controls post-treatment carry `delta_j` variation that the ring covariates pick up at stage 2.
+
+**Note (Omega_0 row-level identification — period strict, unit warn-and-drop, plus connectivity):** Every period must have at least one Omega_0 row (else time FE is structurally unidentified for that period, and dropping it would lose all units' cross-time identification) — hard `ValueError`. Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path at stage 2 excludes them from estimation. This mirrors `TwoStageDiD`'s always-treated unit handling (`two_stage.py:294-336`) and Gardner's framework, which identifies effects from supported observations rather than requiring every unit estimable. **Connectivity:** the supported-units bipartite graph (supported units linked by shared Omega_0 periods) must form a single connected component. If the graph splits into K > 1 components, the iterative FE solver identifies (`mu_i`, `lambda_t`) only up to component-specific constants, and residualization combines `mu_i` from one component with `lambda_t` from another — silently corrupting `y_tilde` and downstream `tau_total` / `delta_j`. Balanced panel + per-unit/per-period Omega_0 coverage is NECESSARY but NOT SUFFICIENT; connectivity is the load-bearing identification condition. Under the current absorbing-treatment regime the disconnected case is plausibly unreachable in practice (we were unable to construct an example surviving the upstream validators), but `_check_omega_0_connectivity` is in place as defense-in-depth and future-proofs Wave B follow-ups (event-study, survey-design integration, reversible-treatment relaxation if ever added).
+
+**Note (Gardner identity, non-staggered):** Under non-staggered timing, the two-stage Gardner residualize-then-fit with the Omega_0-restricted stage 1 is **empirically bit-identical** to the single-stage TWFE ring regression on the full sample using the time-varying form `y_it ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`. This is the non-staggered ring estimator from Butts Equations 4-6. The empirical equivalence is verified by a 20-seed deterministic regression test (`TestSpilloverDiDNonStaggeredFEEquivalence`) at `atol=1e-10`. The Omega_0 restriction is therefore innocent for the non-staggered point estimate — it only changes the variance composition (which is why the stage-1 GMM correction enters at stage 2 in the staggered case). Reported `tau_total` for non-staggered timing IS the Butts Eqs. 4-6 estimator.
+
+**Note:** Treated units have `d_it = 0` (their own location) and fall in Ring_1 geometrically, but the `(1 - D_it)` factor zeros their ring contribution. `n_units_ever_in_ring` counts include treated units in Ring_1 by construction. Under staggered timing this field counts each unit in EVERY ring it visits across periods (not a clean partition); under non-staggered timing it IS a partition.
+
+**Note (anticipation shift):** The `anticipation: int` constructor kwarg (default 0) shifts BOTH the treatment and ring-membership clocks by `-anticipation` so the stage-1 Omega_0 subsample correctly excludes the `anticipation` pre-treatment periods (which would otherwise contaminate the FE estimation with anticipation effects). Concretely, the effective treatment indicator becomes `D'_it = 1{t >= first_treat - anticipation}` and ring membership uses the corresponding shifted "currently-treated" set when computing `d_it` for staggered timing. Stage 2 then estimates `tau_total` and `delta_j` net of the anticipation window. Mirrors `TwoStageDiD`'s `anticipation` parameter semantics — recommended use is a small integer (1-3 periods) when domain knowledge suggests treatment effects begin before formal onset (e.g., policy announcements). Setting `anticipation > 0` AND providing a `first_treat` column where `first_treat - anticipation < min(time)` for any unit will mark that unit as treated from the panel's first observation, with the same baseline-treated handling as if it were always-treated (warn-and-drop via the Omega_0 unit-level check above).
+
+**Note:** No published R/Stata software exists for the two-period ring estimator (Butts Eqs. 5-6). Correctness anchored on (a) synthetic-DGP Monte Carlo identification tests on the **non-staggered** DGP (50-seed default + 200-seed `@pytest.mark.slow` variant — both recover `tau_total` AND `delta_1` within 0.02 absolute tolerance on Butts-Assumption-satisfying DGPs) plus a **staggered**-DGP MC test (30-seed, no slow variant) that anchors `tau_total` only within 0.04 absolute tolerance (looser bound because each staggered DGP is larger and noisier); per-ring `delta_jk` recovery on staggered DGPs and a 200-seed slow staggered variant are queued as follow-ups alongside `event_study=True` support; (b) reduce-to-TWFE limit (non-staggered, 20-seed deterministic bit-identity at `atol=1e-10` via `TestSpilloverDiDNonStaggeredFEEquivalence`); (c) Wave A's Conley sparse-vs-dense parity inherited via `solve_ols`. A reduce-to-`TwoStageDiD` limit was scoped during planning but not shipped in Wave B (the Omega_0 stricter subsample requires additional fixture work to align with `TwoStageDiD`'s `{D_it = 0}` subsample); queued as a follow-up alongside the Gardner GMM correction.
+
+**Assumptions (Butts 2021):**
+
+- **Assumption 1 (Random Sampling)** — i.i.d. panel.
+- **Assumption 3 (Parallel Counterfactual Trends)** — counterfactual trends in absence of all treatment AND zero exposure do not depend on own treatment status.
+- **Assumption 5 (Spillovers Are Local):** (i) `min_{j: D_j=1} d(i,j) > d_bar ⇒ h_i(D-vector) = 0` (spillovers vanish past `d_bar`); (ii) at least one treated unit AND one control unit exist with `min d > d_bar` (clean far-away control group). The validator enforces (ii) via `_validate_far_away_exists` with a strict `> d_bar` check; failure raises `ValueError` citing the assumption.
+- **Assumption 6 (Total Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)`. Stronger than Assumption 3.
+- **Assumption 7 (Spillover Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)` for `S_i ∈ {0, 1}`. Required to identify `gamma_0` / `delta_j`.
+- **Assumption 8 (Parallel Counterfactual Trends, Staggered)** — additive unit + time FE structure on untreated/unexposed potential outcomes. Stronger than Assumption 3.
+
+**Variance (Wave B MVP — documented limitation):**
+
+The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / cluster paths, all dispatched via `vcov_type`). The **Gardner GMM sandwich first-stage uncertainty correction is NOT applied** at stage 2 in this PR. The full GMM + Conley composition is queued as a follow-up enhancement that extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step (see Wave B plan Risks #2 for the IF formula). The reported SE is therefore **biased downward** (underestimated by a few percent in typical settings) because it omits the additional variance contribution from estimating the stage-1 FE; confidence intervals are correspondingly too narrow and p-values too small. Treat reported significance conservatively until the GMM correction lands.
+
+**Edge cases (from paper Section 3.2 / Discussion):**
+
+| # | Edge case | Handling |
+|---|---|---|
+| 1 | No nearby control units (Assumption 5(ii) fails) | Hard error via `_validate_far_away_exists` |
+| 2 | `d_bar` too small (some affected units classified as far) | User responsibility; sensitivity analysis across `d_bar` values recommended (vary `rings[-1]`) |
+| 3 | `d_bar` too large (`S=0` group fewer characteristics) | Same as #2; bias-variance trade-off (paper page 13-14) |
+| 4 | Single-ring `S_i` attenuation (many unaffected units) | Use multiple rings (Equation 6 multi-ring spec); supported by passing more breakpoints |
+| 5 | Spillovers extend past largest ring | User inspects outermost `delta_K` significance; if non-zero, extend the outermost ring |
+| 6 | Additive spillovers in count of nearby treated | `ring_method="count"` deferred (paper notes count form re-introduces functional-form dependence) |
+| 7 | Staggered + negative Goodman-Bacon weights | Two-stage Gardner methodology avoids this (paper page 22) |
+
+**Restrictions / deferred features:**
+
+- `event_study=True` raises `NotImplementedError` — per-event-time × ring decomposition (Butts Table 2) queued for follow-up.
+- `survey_design=` raises `NotImplementedError` — planned follow-up.
+- `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up.
+- `ring_method="count"` not exposed — only the nearest-treated-ring specification.
+- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`} raises `NotImplementedError` — current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF. Use `"hc1"`, `"classical"`, `"conley"`, or pair with `cluster=` for CR1.
+- **`rings[0]` must equal 0** — the partition must cover treated locations (`d_it = 0` belongs to Ring 1). Rings starting at a nonzero inner edge would leave units in `0 <= d_it < rings[0]` as exposed-but-unmodeled, silently biasing the estimator. Validator rejects such inputs.
+- **Balanced panel required (Wave B MVP)** — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. Exact graph-connectivity-based identification (which would relax this to a strictly weaker condition) is queued as a follow-up extension. Validator rejects unbalanced inputs.
+- **One row per `(unit, time)` cell required** — duplicate cells silently re-weight stage-1 FE estimation AND stage-2 OLS. Validator rejects duplicate cells.
+- Data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight) not exposed.
+- Gardner GMM first-stage correction at stage 2 not applied (HC1/Conley/cluster only; documented limitation).
+
+**Implementation:** `diff_diff/spillover.py`. Public class `SpilloverDiD`; result class `SpilloverDiDResults(DiDResults)` at `diff_diff/results.py`. Tests at `tests/test_spillover.py`; DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` (satisfy Butts Assumptions 1/3/5/7 by construction).
+
+---
+
 ## ConleySpatialHAC
 
 **Primary source:** Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. *Journal of Econometrics* 92(1), 1-45. DOI: 10.1016/S0304-4076(98)00084-0
diff --git a/docs/references.rst b/docs/references.rst
index 5dc3d603..a915a3ae 100644
--- a/docs/references.rst
+++ b/docs/references.rst
@@ -182,6 +182,10 @@ Multi-Period and Staggered Adoption
 
 - **Butts, K., & Gardner, J. (2022).** "did2s: Two-Stage Difference-in-Differences." *The R Journal*, 14(1), 162-173. https://doi.org/10.32614/RJ-2022-048
 
+- **Butts, K. (2023).** "Difference-in-Differences with Spatial Spillovers." *arXiv:2105.03737v3* (originally posted 2021). https://arxiv.org/abs/2105.03737
+
+  Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``.
+
 - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169
 
 - **de Chaisemartin, C., & D'Haultfœuille, X. (2022, revised 2024).** "Difference-in-Differences Estimators of Intertemporal Treatment Effects." *NBER Working Paper* 29873. https://www.nber.org/papers/w29873
diff --git a/tests/_dgp_utils.py b/tests/_dgp_utils.py
new file mode 100644
index 00000000..1f568748
--- /dev/null
+++ b/tests/_dgp_utils.py
@@ -0,0 +1,250 @@
+"""Synthetic data-generating processes for spillover-DiD tests.
+
+These factories produce panels satisfying Butts (2021) Assumptions 1, 3, 5,
+and 7 by construction. Used by ``tests/test_spillover.py`` to anchor
+identification claims (Wave B MVP correctness anchors).
+"""
+
+from typing import Optional
+
+import numpy as np
+import pandas as pd
+
+
+def generate_butts_nonstaggered_dgp(
+    *,
+    n_units: int = 200,
+    n_treated: int = 20,
+    n_near_control: int = 80,
+    n_far_control: int = 100,
+    n_periods: int = 2,
+    t_treat: int = 1,
+    tau_total: float = -0.07,
+    delta_1: float = -0.04,
+    d_bar: float = 100.0,
+    near_control_d_max: Optional[float] = None,
+    far_control_d_min: Optional[float] = None,
+    parallel_trends_shock_sd: float = 0.0,
+    error_sd: float = 0.05,
+    seed: int = 42,
+) -> pd.DataFrame:
+    """Non-staggered panel with known direct + spillover effects.
+
+    All treated units share onset at ``t_treat``. Layout:
+
+    - Treated units cluster near (lat=0, lon=0).
+    - Near-controls placed within ``near_control_d_max`` (default
+      ``d_bar / 2`` km) so they're inside the spillover zone.
+    - Far-controls placed beyond ``far_control_d_min`` (default
+      ``2 * d_bar`` km).
+
+    Potential-outcomes model (satisfies Assumption 6/7 by construction):
+
+    .. code::
+
+        Y_it(0, 0) = mu_i + lambda_t + e_it
+        Y_i1(1, 0) - Y_i0(0, 0) = (lambda_1 - lambda_0) + tau_total
+                                    for treated units
+        Y_i1(0, S_i=1) - Y_i0(0, S_i=1) = (lambda_1 - lambda_0) + delta_1
+                                    for near-control units (D_i=0, S_i=1)
+        Y_i1(0, S_i=0) - Y_i0(0, S_i=0) = (lambda_1 - lambda_0)
+                                    for far-control units (clean)
+
+    Returns
+    -------
+    DataFrame with columns: ``unit``, ``time``, ``D`` (binary at row level),
+    ``first_treat`` (per-unit onset; ``np.inf`` for never-treated),
+    ``y`` (outcome), ``lat``, ``lon``.
+    """
+    if n_units != n_treated + n_near_control + n_far_control:
+        n_units = n_treated + n_near_control + n_far_control
+
+    if near_control_d_max is None:
+        near_control_d_max = d_bar / 2.0
+    if far_control_d_min is None:
+        far_control_d_min = 2.0 * d_bar
+
+    rng = np.random.default_rng(seed)
+
+    # ~111 km per latitude degree on the equator (haversine approx).
+    KM_PER_DEG = 111.195
+    near_d_max_deg = near_control_d_max / KM_PER_DEG
+    far_d_min_deg = far_control_d_min / KM_PER_DEG
+
+    units = [f"u{i:04d}" for i in range(n_units)]
+    mu = rng.normal(0.0, 0.5, size=n_units)
+    # Time effects: linear trend
+    lambda_t = np.array([0.05 * t for t in range(n_periods)])
+    if parallel_trends_shock_sd > 0:
+        lambda_t = lambda_t + rng.normal(0, parallel_trends_shock_sd, size=n_periods)
+
+    coords = np.zeros((n_units, 2))
+    is_treated = np.zeros(n_units, dtype=bool)
+    is_near = np.zeros(n_units, dtype=bool)
+
+    # Treated: cluster very near origin (within 1 km)
+    for i in range(n_treated):
+        coords[i, 0] = rng.normal(0.0, 0.005)
+        coords[i, 1] = rng.normal(0.0, 0.005)
+        is_treated[i] = True
+
+    # Near-control: within near_d_max_deg of origin, but not at origin
+    near_start = n_treated
+    for i in range(n_near_control):
+        idx = near_start + i
+        # Uniformly distributed in an annulus to avoid clustering
+        r_deg = rng.uniform(near_d_max_deg * 0.1, near_d_max_deg * 0.95)
+        theta = rng.uniform(0, 2 * np.pi)
+        coords[idx, 0] = r_deg * np.cos(theta)
+        coords[idx, 1] = r_deg * np.sin(theta)
+        is_near[idx] = True
+
+    # Far-control: at far_d_min_deg+ from origin
+    far_start = n_treated + n_near_control
+    for i in range(n_far_control):
+        idx = far_start + i
+        r_deg = rng.uniform(far_d_min_deg, far_d_min_deg * 1.5)
+        theta = rng.uniform(0, 2 * np.pi)
+        coords[idx, 0] = r_deg * np.cos(theta)
+        coords[idx, 1] = r_deg * np.sin(theta)
+
+    rows = []
+    for i, u in enumerate(units):
+        ft = float(t_treat) if is_treated[i] else np.inf
+        for t in range(n_periods):
+            y_clean = mu[i] + lambda_t[t]
+            post = t >= t_treat
+            if is_treated[i] and post:
+                y = y_clean + tau_total
+            elif is_near[i] and post:
+                y = y_clean + delta_1
+            else:
+                y = y_clean
+            y += rng.normal(0, error_sd)
+            rows.append(
+                {
+                    "unit": u,
+                    "time": t,
+                    "lat": coords[i, 0],
+                    "lon": coords[i, 1],
+                    "D": int(is_treated[i] and post),
+                    "first_treat": ft,
+                    "y": y,
+                }
+            )
+    return pd.DataFrame(rows)
+
+
+def generate_butts_staggered_dgp(
+    *,
+    n_cohorts: int = 3,
+    units_per_cohort: int = 30,
+    n_far_control: int = 90,
+    n_periods: int = 6,
+    cohort_onsets: Optional[list] = None,
+    tau_total: float = -0.07,
+    delta_1: float = -0.04,
+    d_bar: float = 100.0,
+    error_sd: float = 0.05,
+    seed: int = 42,
+) -> pd.DataFrame:
+    """Staggered panel with known per-cohort effects.
+
+    Three cohorts treat at distinct times; far-control group never treated.
+    Each cohort is clustered at a distinct geographic center so spillover
+    is well-defined per cohort (units near cohort C1 receive spillover
+    starting at C1's onset, etc.).
+
+    Returns the same column schema as :func:`generate_butts_nonstaggered_dgp`.
+    """
+    if cohort_onsets is None:
+        cohort_onsets = [1, 2, 3][:n_cohorts]
+
+    rng = np.random.default_rng(seed)
+    KM_PER_DEG = 111.195
+
+    rows = []
+    coords_by_unit = {}
+    first_treat_by_unit = {}
+    is_treated_unit = {}
+    # Spillover-trigger onset per near-control unit (the cohort it belongs to).
+    spillover_onset_by_unit: dict = {}
+
+    unit_idx = 0
+
+    # Cohort centers spaced ~5*d_bar apart so cohorts don't overlap.
+    cohort_spacing_deg = (5 * d_bar) / KM_PER_DEG
+
+    for c_idx in range(n_cohorts):
+        center_lat = c_idx * cohort_spacing_deg
+        center_lon = 0.0
+        onset = cohort_onsets[c_idx]
+        for i in range(units_per_cohort):
+            u = f"c{c_idx}_{unit_idx:04d}"
+            unit_idx += 1
+            # 1/3 of cohort members are "treated" (cluster at center);
+            # 2/3 are "near-controls" within d_bar/2 of center.
+            if i < units_per_cohort // 3:
+                coords_by_unit[u] = (
+                    center_lat + rng.normal(0, 0.003),
+                    center_lon + rng.normal(0, 0.003),
+                )
+                first_treat_by_unit[u] = float(onset)
+                is_treated_unit[u] = True
+            else:
+                r_deg = rng.uniform(0.05, (d_bar / 2.0) / KM_PER_DEG * 0.9)
+                theta = rng.uniform(0, 2 * np.pi)
+                coords_by_unit[u] = (
+                    center_lat + r_deg * np.cos(theta),
+                    center_lon + r_deg * np.sin(theta),
+                )
+                first_treat_by_unit[u] = np.inf
+                is_treated_unit[u] = False
+                # Near-control receives spillover when its cohort treats.
+                spillover_onset_by_unit[u] = float(onset)
+
+    # Far-control: placed far from all cohort centers
+    far_center_lat = n_cohorts * cohort_spacing_deg + 5 * d_bar / KM_PER_DEG
+    for i in range(n_far_control):
+        u = f"far_{i:04d}"
+        unit_idx += 1
+        r_deg = rng.uniform(0.0, 1.0)
+        theta = rng.uniform(0, 2 * np.pi)
+        coords_by_unit[u] = (
+            far_center_lat + r_deg * np.cos(theta),
+            r_deg * np.sin(theta),
+        )
+        first_treat_by_unit[u] = np.inf
+        is_treated_unit[u] = False
+        # Far-controls receive no spillover (no nearby cohort).
+
+    mu = {u: rng.normal(0, 0.5) for u in coords_by_unit}
+    lambda_t = np.array([0.05 * t for t in range(n_periods)])
+
+    for u, (lat, lon) in coords_by_unit.items():
+        ft = first_treat_by_unit[u]
+        is_treat = is_treated_unit[u]
+        spillover_onset = spillover_onset_by_unit.get(u, np.inf)
+        for t in range(n_periods):
+            y_clean = mu[u] + lambda_t[t]
+            # Direct effect for own treated unit
+            if is_treat and t >= ft:
+                y = y_clean + tau_total
+            # Spillover on near-control once its cohort activates
+            elif (not is_treat) and t >= spillover_onset:
+                y = y_clean + delta_1
+            else:
+                y = y_clean
+            y += rng.normal(0, error_sd)
+            rows.append(
+                {
+                    "unit": u,
+                    "time": t,
+                    "lat": lat,
+                    "lon": lon,
+                    "D": int(is_treat and t >= ft),
+                    "first_treat": ft,
+                    "y": y,
+                }
+            )
+    return pd.DataFrame(rows)
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
new file mode 100644
index 00000000..c3892942
--- /dev/null
+++ b/tests/test_spillover.py
@@ -0,0 +1,2610 @@
+"""Tests for SpilloverDiD (Butts 2021 ring-indicator spillover-aware DiD).
+
+Step 1 surface: ring-construction helpers and the public class scaffold.
+Step 2+ surfaces are added incrementally as the implementation lands.
+"""
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from diff_diff.spillover import (
+    SpilloverDiD,
+    _apply_callable_metric_pairwise,
+    _build_ring_indicators,
+    _check_omega_0_connectivity,
+    _compute_nearest_treated_distance_sparse,
+    _compute_nearest_treated_distance_staggered,
+    _compute_nearest_treated_distance_static,
+    _convert_treatment_to_first_treat,
+    _euclidean_pairwise,
+    _extract_treatment_onsets,
+    _haversine_km_pairwise,
+    _pairwise_ring_distances,
+    _ring_label,
+)
+from tests._dgp_utils import (
+    generate_butts_nonstaggered_dgp,
+    generate_butts_staggered_dgp,
+)
+
+# =============================================================================
+# Pairwise-distance primitives
+# =============================================================================
+
+
+class TestHaversinePairwise:
+    """Tests for _haversine_km_pairwise."""
+
+    def test_zero_distance_when_same_point(self):
+        coords = np.array([[40.7128, -74.0060]])  # NYC
+        result = _haversine_km_pairwise(coords, coords)
+        assert result.shape == (1, 1)
+        assert abs(result[0, 0]) < 1e-9
+
+    def test_known_pair_nyc_to_la(self):
+        # NYC (40.7128 N, 74.0060 W) to LA (34.0522 N, 118.2437 W)
+        # Reference great-circle distance ~ 3935.7 km (within 0.5% of any source)
+        nyc = np.array([[40.7128, -74.0060]])
+        la = np.array([[34.0522, -118.2437]])
+        result = _haversine_km_pairwise(nyc, la)
+        assert result.shape == (1, 1)
+        assert abs(result[0, 0] - 3935.7) < 5.0
+
+    def test_pairwise_matrix_shape(self):
+        coords_a = np.array([[40.0, -74.0], [34.0, -118.0]])
+        coords_b = np.array([[51.5, -0.1], [35.7, 139.7], [40.7, -74.0]])
+        result = _haversine_km_pairwise(coords_a, coords_b)
+        assert result.shape == (2, 3)
+        # All non-negative
+        assert (result >= 0).all()
+
+
+class TestEuclideanPairwise:
+    """Tests for _euclidean_pairwise."""
+
+    def test_known_3_4_5(self):
+        a = np.array([[0.0, 0.0]])
+        b = np.array([[3.0, 4.0]])
+        result = _euclidean_pairwise(a, b)
+        assert abs(result[0, 0] - 5.0) < 1e-12
+
+    def test_zero_distance_same_point(self):
+        coords = np.array([[1.5, 2.5], [3.0, 4.0]])
+        result = _euclidean_pairwise(coords, coords)
+        np.testing.assert_allclose(np.diag(result), 0.0, atol=1e-12)
+
+
+class TestPairwiseRingDistances:
+    """Tests for the _pairwise_ring_distances dispatch."""
+
+    def test_haversine_branch(self):
+        a = np.array([[0.0, 0.0]])
+        b = np.array([[0.0, 0.0]])
+        result = _pairwise_ring_distances(a, b, "haversine")
+        assert result.shape == (1, 1)
+        assert abs(result[0, 0]) < 1e-9
+
+    def test_euclidean_branch(self):
+        a = np.array([[0.0, 0.0]])
+        b = np.array([[1.0, 0.0]])
+        result = _pairwise_ring_distances(a, b, "euclidean")
+        assert abs(result[0, 0] - 1.0) < 1e-12
+
+    def test_callable_branch(self):
+        a = np.array([[0.0, 0.0], [1.0, 1.0]])
+        b = np.array([[2.0, 2.0]])
+
+        def custom(x, y):
+            return np.full((x.shape[0], y.shape[0]), 7.5)
+
+        result = _pairwise_ring_distances(a, b, custom)
+        assert result.shape == (2, 1)
+        np.testing.assert_allclose(result, 7.5)
+
+    def test_unknown_metric_raises(self):
+        a = np.array([[0.0, 0.0]])
+        b = np.array([[1.0, 1.0]])
+        with pytest.raises(ValueError, match="Unknown conley_metric"):
+            _pairwise_ring_distances(a, b, "manhattan")
+
+
+class TestApplyCallableMetricPairwise:
+    """Validation of user-supplied callable distance metrics."""
+
+    def test_wrong_shape_raises(self):
+        a = np.array([[0.0, 0.0], [1.0, 1.0]])
+        b = np.array([[2.0, 2.0], [3.0, 3.0]])
+
+        def bad(x, y):
+            return np.zeros((1, 1))
+
+        with pytest.raises(ValueError, match="shape"):
+            _apply_callable_metric_pairwise(bad, a, b)
+
+    def test_non_finite_raises(self):
+        a = np.array([[0.0, 0.0]])
+        b = np.array([[1.0, 1.0]])
+
+        def bad(x, y):
+            return np.array([[np.inf]])
+
+        with pytest.raises(ValueError, match="non-finite"):
+            _apply_callable_metric_pairwise(bad, a, b)
+
+    def test_negative_raises(self):
+        a = np.array([[0.0, 0.0]])
+        b = np.array([[1.0, 1.0]])
+
+        def bad(x, y):
+            return np.array([[-1.0]])
+
+        with pytest.raises(ValueError, match="negative"):
+            _apply_callable_metric_pairwise(bad, a, b)
+
+
+# =============================================================================
+# Static nearest-treated distance
+# =============================================================================
+
+
+@pytest.fixture
+def small_static_panel():
+    """3 treated units near origin, 3 near-controls 25-100km out, 3 far at 500km."""
+    rng = np.random.default_rng(42)
+    treated = [(0.0 + rng.normal(0, 0.05), 0.0 + rng.normal(0, 0.05)) for _ in range(3)]
+    near = [(0.4 + i * 0.05, 0.0) for i in range(3)]  # ~44-55 km east
+    far = [(5.0 + i * 0.1, 0.0) for i in range(3)]  # ~556+ km east
+    units = []
+    coords = []
+    treats = []
+    for i, c in enumerate(treated):
+        units.append(f"T{i}")
+        coords.append(c)
+        treats.append(1)
+    for i, c in enumerate(near):
+        units.append(f"N{i}")
+        coords.append(c)
+        treats.append(0)
+    for i, c in enumerate(far):
+        units.append(f"F{i}")
+        coords.append(c)
+        treats.append(0)
+    # Two periods so the static panel has 2 rows per unit
+    rows = []
+    for u, c, d in zip(units, coords, treats):
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": u,
+                    "time": t,
+                    "lat": c[0],
+                    "lon": c[1],
+                    "D": d * t,  # turns on at t=1 for treated
+                }
+            )
+    return pd.DataFrame(rows)
+
+
+class TestComputeNearestTreatedDistanceStatic:
+    """Static (non-staggered) nearest-treated distance helper."""
+
+    def test_treated_units_have_zero_distance(self, small_static_panel):
+        treated_ids = np.array(["T0", "T1", "T2"])
+        d_i, unit_index = _compute_nearest_treated_distance_static(
+            small_static_panel,
+            unit="unit",
+            coords=("lat", "lon"),
+            metric="haversine",
+            treated_unit_ids=treated_ids,
+        )
+        for tid in treated_ids:
+            pos = np.where(unit_index == tid)[0][0]
+            # Treated units' nearest treated is themselves OR an adjacent T*; the
+            # static fixture clusters them within rng.normal(0, 0.05) deg ~6 km.
+            assert d_i[pos] < 15.0  # all treated cluster within ~15 km
+
+    def test_near_controls_below_far_controls(self, small_static_panel):
+        treated_ids = np.array(["T0", "T1", "T2"])
+        d_i, unit_index = _compute_nearest_treated_distance_static(
+            small_static_panel,
+            unit="unit",
+            coords=("lat", "lon"),
+            metric="haversine",
+            treated_unit_ids=treated_ids,
+        )
+        near_pos = [np.where(unit_index == f"N{i}")[0][0] for i in range(3)]
+        far_pos = [np.where(unit_index == f"F{i}")[0][0] for i in range(3)]
+        assert all(d_i[p] < 100.0 for p in near_pos)
+        assert all(d_i[p] > 500.0 for p in far_pos)
+
+    def test_euclidean_metric(self, small_static_panel):
+        treated_ids = np.array(["T0", "T1", "T2"])
+        d_i_h, _ = _compute_nearest_treated_distance_static(
+            small_static_panel,
+            unit="unit",
+            coords=("lat", "lon"),
+            metric="haversine",
+            treated_unit_ids=treated_ids,
+        )
+        d_i_e, _ = _compute_nearest_treated_distance_static(
+            small_static_panel,
+            unit="unit",
+            coords=("lat", "lon"),
+            metric="euclidean",
+            treated_unit_ids=treated_ids,
+        )
+        # Different units, but ordering should be consistent (near < far)
+        # so the rank of distances matches between metrics.
+        order_h = np.argsort(d_i_h)
+        order_e = np.argsort(d_i_e)
+        np.testing.assert_array_equal(order_h, order_e)
+
+    def test_no_treated_units_raises(self, small_static_panel):
+        with pytest.raises(ValueError, match="no treated units present"):
+            _compute_nearest_treated_distance_static(
+                small_static_panel,
+                unit="unit",
+                coords=("lat", "lon"),
+                metric="haversine",
+                treated_unit_ids=np.array(["nonexistent_unit"]),
+            )
+
+    def test_unit_index_is_sorted(self, small_static_panel):
+        treated_ids = np.array(["T0", "T1", "T2"])
+        _, unit_index = _compute_nearest_treated_distance_static(
+            small_static_panel,
+            unit="unit",
+            coords=("lat", "lon"),
+            metric="haversine",
+            treated_unit_ids=treated_ids,
+        )
+        # Sorted lexicographically: F0, F1, F2, N0, N1, N2, T0, T1, T2
+        expected = ["F0", "F1", "F2", "N0", "N1", "N2", "T0", "T1", "T2"]
+        np.testing.assert_array_equal(unit_index, expected)
+
+
+class TestComputeNearestTreatedDistanceSparse:
+    """Sparse cKDTree path for nearest-treated computation."""
+
+    def test_sparse_matches_dense_haversine(self, small_static_panel):
+        # Force sparse path on the small fixture by using a tight cutoff.
+        treated_ids = np.array(["T0", "T1", "T2"])
+        unit_coords_df = (
+            small_static_panel[["unit", "lat", "lon"]]
+            .drop_duplicates(subset="unit")
+            .set_index("unit")
+            .sort_index()
+        )
+        all_coords = unit_coords_df[["lat", "lon"]].values.astype(np.float64)
+        treated_mask = np.array(
+            [uid in set(treated_ids.tolist()) for uid in unit_coords_df.index],
+            dtype=bool,
+        )
+        treated_coords = all_coords[treated_mask]
+        # Sparse path with a 1000 km cutoff should agree with the dense path on
+        # all in-range units; far controls (>500 km but <1000 km from any
+        # treated) get their true nearest-treated distance.
+        d_sparse = _compute_nearest_treated_distance_sparse(
+            all_coords=all_coords,
+            treated_coords=treated_coords,
+            metric="haversine",
+            cutoff_km=1000.0,
+        )
+        d_dense = _haversine_km_pairwise(all_coords, treated_coords).min(axis=1)
+        # Mask: only compare entries within cutoff in dense (sparse returns inf otherwise).
+        in_range = d_dense <= 1000.0 * (1 + 1e-6)
+        np.testing.assert_allclose(d_sparse[in_range], d_dense[in_range], atol=1e-8)
+
+    def test_sparse_inf_when_no_treated_in_range(self):
+        # Single unit at (50, 0); treated cluster at (0, 0). With cutoff 100km,
+        # great-circle ~5500 km exceeds it; expect inf.
+        all_coords = np.array([[50.0, 0.0]])
+        treated_coords = np.array([[0.0, 0.0]])
+        d = _compute_nearest_treated_distance_sparse(
+            all_coords=all_coords,
+            treated_coords=treated_coords,
+            metric="haversine",
+            cutoff_km=100.0,
+        )
+        assert np.isinf(d[0])
+
+    def test_sparse_euclidean(self):
+        all_coords = np.array([[0.0, 0.0], [5.0, 0.0], [100.0, 0.0]])
+        treated_coords = np.array([[0.0, 0.0]])
+        d = _compute_nearest_treated_distance_sparse(
+            all_coords=all_coords,
+            treated_coords=treated_coords,
+            metric="euclidean",
+            cutoff_km=10.0,
+        )
+        assert abs(d[0]) < 1e-12
+        assert abs(d[1] - 5.0) < 1e-12
+        assert np.isinf(d[2])
+
+
+# =============================================================================
+# Staggered nearest-treated distance
+# =============================================================================
+
+
+@pytest.fixture
+def staggered_panel():
+    """Panel with two cohorts (t_treat=1 and t_treat=2) plus never-treated."""
+    rows = []
+    # Cohort A: 2 units treated at t=1 (near origin)
+    cohort_a = {"A0": (0.0, 0.0), "A1": (0.1, 0.0)}
+    # Cohort B: 2 units treated at t=2 (10 deg east of origin, ~1100 km away)
+    cohort_b = {"B0": (0.0, 10.0), "B1": (0.0, 10.1)}
+    # Never-treated: 1 unit far away
+    never = {"N0": (50.0, 0.0)}  # very far north
+    first_treat = {
+        **{u: 1 for u in cohort_a},
+        **{u: 2 for u in cohort_b},
+        **{u: np.inf for u in never},
+    }
+    coords = {**cohort_a, **cohort_b, **never}
+    for t in range(4):  # periods 0..3
+        for u, (lat, lon) in coords.items():
+            rows.append({"unit": u, "time": t, "lat": lat, "lon": lon})
+    df = pd.DataFrame(rows)
+    return df, first_treat
+
+
+class TestComputeNearestTreatedDistanceStaggered:
+    """Staggered (time-varying) nearest-treated distance helper."""
+
+    def test_inf_pre_any_treatment(self, staggered_panel):
+        df, ft = staggered_panel
+        d_it, row_unit, row_time = _compute_nearest_treated_distance_staggered(
+            df,
+            unit="unit",
+            time="time",
+            coords=("lat", "lon"),
+            metric="haversine",
+            first_treat_by_unit=ft,
+        )
+        # Period 0 has no treated units yet -> d_it = inf for all rows.
+        mask_t0 = row_time == 0
+        assert np.isinf(d_it[mask_t0]).all()
+
+    def test_cohort_a_active_at_t1(self, staggered_panel):
+        df, ft = staggered_panel
+        d_it, row_unit, row_time = _compute_nearest_treated_distance_staggered(
+            df,
+            unit="unit",
+            time="time",
+            coords=("lat", "lon"),
+            metric="haversine",
+            first_treat_by_unit=ft,
+        )
+        mask_t1 = row_time == 1
+        # Cohort A treats at t=1; B units should be ~1100km from A; A units near zero.
+        for u in ("A0", "A1"):
+            row = mask_t1 & (row_unit == u)
+            assert d_it[row][0] < 15.0  # within their own cohort
+        for u in ("B0", "B1"):
+            row = mask_t1 & (row_unit == u)
+            # B is ~1100km east of A; min distance to {A0, A1}.
+            # B0 at lon=10 -> 1112 km; B1 at lon=10.1 -> 1123 km.
+            assert 1100.0 < d_it[row][0] < 1130.0
+
+    def test_running_min_across_cohorts_at_t2(self, staggered_panel):
+        df, ft = staggered_panel
+        d_it, row_unit, row_time = _compute_nearest_treated_distance_staggered(
+            df,
+            unit="unit",
+            time="time",
+            coords=("lat", "lon"),
+            metric="haversine",
+            first_treat_by_unit=ft,
+        )
+        mask_t2 = row_time == 2
+        # At t=2, B0 and B1 are also treated; the nearest-treated set for B units is {A,B}
+        # but B is closer to itself -> nearly zero distance now.
+        for u in ("B0", "B1"):
+            row = mask_t2 & (row_unit == u)
+            assert d_it[row][0] < 15.0
+
+
+# =============================================================================
+# Ring indicator construction
+# =============================================================================
+
+
+class TestBuildRingIndicators:
+    """Tests for _build_ring_indicators."""
+
+    def test_three_rings_three_distances(self):
+        rings = [0.0, 50.0, 100.0, 200.0]  # K=3 rings
+        d_values = np.array([25.0, 75.0, 150.0])
+        masks = _build_ring_indicators(d_values, rings)
+        assert masks.shape == (3, 3)
+        # Row 0: distance 25 -> Ring 1 (0 to 50)
+        np.testing.assert_array_equal(masks[0], [True, False, False])
+        # Row 1: distance 75 -> Ring 2 (50 to 100)
+        np.testing.assert_array_equal(masks[1], [False, True, False])
+        # Row 2: distance 150 -> Ring 3 (100 to 200)
+        np.testing.assert_array_equal(masks[2], [False, False, True])
+
+    def test_interior_boundary_belongs_to_upper_ring(self):
+        """Unit exactly at the boundary between two interior rings."""
+        rings = [0.0, 50.0, 100.0, 200.0]
+        d_values = np.array([50.0])  # exactly on the 50.0 boundary
+        masks = _build_ring_indicators(d_values, rings)
+        # 50.0 should fall in Ring 2 (the upper of the boundary pair) per the
+        # half-open-at-top convention.
+        np.testing.assert_array_equal(masks[0], [False, True, False])
+
+    def test_outermost_boundary_belongs_to_last_ring(self):
+        """Unit exactly at d_bar should fall in the outermost ring, not be far."""
+        rings = [0.0, 50.0, 100.0, 200.0]
+        d_values = np.array([200.0])  # exactly at d_bar
+        masks = _build_ring_indicators(d_values, rings)
+        # 200.0 should fall in the OUTERMOST ring (closed-at-top convention).
+        np.testing.assert_array_equal(masks[0], [False, False, True])
+
+    def test_distance_at_origin_lands_in_first_ring(self):
+        """Treated units have d_i = 0; they fall in Ring_1."""
+        rings = [0.0, 50.0, 100.0, 200.0]
+        d_values = np.array([0.0])
+        masks = _build_ring_indicators(d_values, rings)
+        np.testing.assert_array_equal(masks[0], [True, False, False])
+
+    def test_far_away_unit_in_no_ring(self):
+        """Distance beyond d_bar puts unit in NO ring."""
+        rings = [0.0, 50.0, 100.0, 200.0]
+        d_values = np.array([300.0])
+        masks = _build_ring_indicators(d_values, rings)
+        np.testing.assert_array_equal(masks[0], [False, False, False])
+
+    def test_single_ring(self):
+        """K=1 (single ring) case (Butts Equation 5 single-S_i form)."""
+        rings = [0.0, 100.0]
+        d_values = np.array([0.0, 50.0, 99.9, 100.0, 100.1])
+        masks = _build_ring_indicators(d_values, rings)
+        assert masks.shape == (5, 1)
+        # First four are <= 100, last is > 100 (far-away).
+        np.testing.assert_array_equal(masks[:, 0], [True, True, True, True, False])
+
+    def test_too_few_breakpoints_raises(self):
+        with pytest.raises(ValueError, match="at least 2"):
+            _build_ring_indicators(np.array([0.0]), [50.0])
+
+    def test_non_increasing_raises(self):
+        with pytest.raises(ValueError, match="strictly increasing"):
+            _build_ring_indicators(np.array([0.0]), [50.0, 50.0, 100.0])
+
+    def test_negative_raises(self):
+        with pytest.raises(ValueError, match="non-negative"):
+            _build_ring_indicators(np.array([0.0]), [-10.0, 100.0])
+
+
+class TestRingLabel:
+    """Tests for _ring_label."""
+
+    def test_interior_ring_half_open(self):
+        rings = [0.0, 50.0, 100.0, 200.0]
+        assert _ring_label(rings, 0) == "[0, 50)"
+        assert _ring_label(rings, 1) == "[50, 100)"
+
+    def test_outermost_ring_closed(self):
+        rings = [0.0, 50.0, 100.0, 200.0]
+        assert _ring_label(rings, 2) == "[100, 200]"
+
+    def test_single_ring_closed_form(self):
+        rings = [0.0, 100.0]
+        assert _ring_label(rings, 0) == "[0, 100]"
+
+
+# =============================================================================
+# Public class skeleton
+# =============================================================================
+
+
+class TestSpilloverDiDInitGetParamsSetParams:
+    """Constructor + sklearn-like get_params / set_params surface."""
+
+    def test_construction_with_defaults(self):
+        est = SpilloverDiD(rings=[0.0, 50.0, 100.0])
+        assert est.rings == [0.0, 50.0, 100.0]
+        assert est.d_bar is None
+        assert est.vcov_type == "hc1"
+        assert est.is_fitted_ is False
+
+    def test_construction_with_all_kwargs(self):
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            vcov_type="conley",
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+            cluster="region",
+            alpha=0.10,
+            anticipation=1,
+            event_study=True,
+            horizon_max=5,
+            rank_deficient_action="error",
+        )
+        assert est.d_bar == 200.0
+        assert est.vcov_type == "conley"
+        assert est.cluster == "region"
+        assert est.alpha == 0.10
+        assert est.event_study is True
+
+    def test_get_params_returns_constructor_args(self):
+        est = SpilloverDiD(rings=[0.0, 100.0])
+        params = est.get_params()
+        # Check all constructor args present
+        expected = {
+            "rings",
+            "d_bar",
+            "vcov_type",
+            "conley_coords",
+            "conley_metric",
+            "conley_cutoff_km",
+            "conley_lag_cutoff",
+            "cluster",
+            "alpha",
+            "anticipation",
+            "event_study",
+            "horizon_max",
+            "rank_deficient_action",
+        }
+        assert set(params.keys()) == expected
+
+    def test_set_params_updates_attribute(self):
+        est = SpilloverDiD(rings=[0.0, 50.0])
+        est.set_params(d_bar=100.0, alpha=0.10)
+        assert est.d_bar == 100.0
+        assert est.alpha == 0.10
+
+    def test_set_params_returns_self(self):
+        est = SpilloverDiD(rings=[0.0, 50.0])
+        out = est.set_params(d_bar=100.0)
+        assert out is est
+
+    def test_set_params_rejects_unknown_key(self):
+        est = SpilloverDiD(rings=[0.0, 50.0])
+        with pytest.raises(ValueError, match="Unknown parameter"):
+            est.set_params(nonexistent_kwarg=42)
+
+    def test_fit_survey_design_not_implemented(self):
+        """survey_design= is a deferred enhancement (Wave B MVP)."""
+        est = SpilloverDiD(rings=[0.0, 50.0], conley_coords=("lat", "lon"))
+        df = pd.DataFrame(
+            {
+                "unit": ["A", "A"],
+                "time": [0, 1],
+                "y": [1.0, 2.0],
+                "D": [0, 1],
+                "lat": [0.0, 0.0],
+                "lon": [0.0, 0.0],
+            }
+        )
+        with pytest.raises(NotImplementedError, match="survey_design"):
+            est.fit(
+                df,
+                outcome="y",
+                unit="unit",
+                time="time",
+                treatment="D",
+                survey_design=object(),
+            )
+
+
+# =============================================================================
+# Step 3: Two-stage Gardner fit() integration
+# =============================================================================
+
+
+def _make_butts_2period_dgp(
+    *,
+    n_treated: int = 10,
+    n_near_control: int = 30,
+    n_far_control: int = 30,
+    tau_total: float = -0.07,
+    delta_1: float = -0.04,
+    d_bar: float = 100.0,
+    seed: int = 42,
+) -> pd.DataFrame:
+    """Build a 2-period panel with known direct + spillover effects.
+
+    Layout:
+      - Treated units cluster near (lat=0, lon=0).
+      - Near-controls distributed within d_bar km.
+      - Far-controls placed ~2*d_bar km away (clean control group).
+    Outcomes (potential outcomes model):
+      - Y_it(0, 0) = mu_i + lambda_t + e_it  (clean trend, common across all units)
+      - Treated unit at t=1: Y = mu_i + lambda_1 + tau_total + e_it
+      - Near-control at t=1: Y = mu_i + lambda_1 + delta_1 + e_it
+      - Far-control at t=1: Y = mu_i + lambda_1 + e_it
+    All units satisfy parallel trends (Butts Assumption 6/7).
+    """
+    rng = np.random.default_rng(seed)
+    n_units = n_treated + n_near_control + n_far_control
+    units = [f"u{i:03d}" for i in range(n_units)]
+    mu = rng.normal(0.0, 0.5, size=n_units)
+
+    coords = []
+    is_treated = []
+    is_near = []
+    for i in range(n_treated):
+        # Cluster within ~5 km of origin
+        coords.append((rng.normal(0, 0.05), rng.normal(0, 0.05)))
+        is_treated.append(True)
+        is_near.append(False)
+    for i in range(n_near_control):
+        # Within d_bar (uniform in a band 10–80 km east)
+        lat = rng.uniform(0.1, 0.7)  # ~11–78 km north
+        lon = rng.uniform(-0.3, 0.3)  # spread east-west
+        coords.append((lat, lon))
+        is_treated.append(False)
+        is_near.append(True)
+    for i in range(n_far_control):
+        # Far-aways at 2*d_bar+ km
+        lat = rng.uniform(2.0, 3.0)  # ~220–330 km north
+        lon = rng.uniform(-0.5, 0.5)
+        coords.append((lat, lon))
+        is_treated.append(False)
+        is_near.append(False)
+
+    rows = []
+    lambda_t = [0.0, 0.1]  # common time trend
+    for i, u in enumerate(units):
+        for t in (0, 1):
+            y_clean = mu[i] + lambda_t[t]
+            if t == 1 and is_treated[i]:
+                y = y_clean + tau_total
+            elif t == 1 and is_near[i]:
+                y = y_clean + delta_1
+            else:
+                y = y_clean
+            y += rng.normal(0, 0.02)  # noise
+            rows.append(
+                {
+                    "unit": u,
+                    "time": t,
+                    "lat": coords[i][0],
+                    "lon": coords[i][1],
+                    "D": int(is_treated[i] and t == 1),
+                    "y": y,
+                }
+            )
+    return pd.DataFrame(rows)
+
+
+class TestSpilloverDiDFitBasic:
+    """Step 3 integration: fit() produces sensible point estimates."""
+
+    def test_fit_runs_without_error(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+        )
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        assert result is not None
+        assert est.is_fitted_
+
+    def test_recovers_tau_total_within_tolerance(self):
+        df = _make_butts_2period_dgp(seed=42, tau_total=-0.07, delta_1=-0.04)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # Single-seed tolerance — looser than the 200-seed MC test in Step 5.
+        assert abs(result.att - (-0.07)) < 0.04
+
+    def test_recovers_ring_coefficient(self):
+        df = _make_butts_2period_dgp(seed=42, tau_total=-0.07, delta_1=-0.04)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        assert result.spillover_effects is not None
+        ring_coef = result.spillover_effects.iloc[0]["coef"]
+        assert abs(ring_coef - (-0.04)) < 0.04
+
+    def test_result_has_expected_fields(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        assert result.ring_breakpoints == [0.0, 100.0]
+        assert result.d_bar == 100.0
+        assert result.is_staggered is False
+        assert result.n_far_away_obs > 0
+        assert result.stage1_n_obs > 0
+        assert "[0, 100]" in result.n_units_ever_in_ring
+
+    def test_summary_includes_ring_block(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 50.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        summary = result.summary()
+        assert "Spillover Effects" in summary
+        # Two ring labels: [0, 50) and [50, 100]
+        assert "[0, 50)" in summary or "[50, 100]" in summary
+
+    def test_to_dict_serializes_spillover_effects(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        d = result.to_dict()
+        assert "spillover_effects" in d
+        assert d["spillover_effects"] is not None
+        assert "ring_breakpoints" in d
+        assert d["d_bar"] == 100.0
+
+
+class TestSpilloverDiDRawDataInvariant:
+    """Step 3: caller's DataFrame must not be mutated by fit()."""
+
+    def test_caller_data_unchanged(self):
+        df = _make_butts_2period_dgp(seed=42)
+        original_cols = list(df.columns)
+        original_len = len(df)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # Caller's DataFrame should not gain or lose columns/rows from fit().
+        assert list(df.columns) == original_cols
+        assert len(df) == original_len
+
+
+# =============================================================================
+# Step 5: Identification MC tests via _dgp_utils.py
+# =============================================================================
+
+
+class TestSpilloverDiDIdentification:
+    """50-seed Monte Carlo: SpilloverDiD recovers known DGP within MC tolerance.
+
+    Plan's Step 5 target was 200 seeds; this is a faster default that still
+    rejects gross misidentification. A 200-seed version marked `@pytest.mark.slow`
+    runs in CI's full suite (`pytest -m slow`).
+    """
+
+    def test_nonstaggered_recovers_tau_total(self):
+        att_estimates = []
+        n_seeds = 50
+        for s in range(n_seeds):
+            df = generate_butts_nonstaggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s)
+            result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit(
+                df, outcome="y", unit="unit", time="time", treatment="D"
+            )
+            att_estimates.append(result.att)
+        mean_att = float(np.mean(att_estimates))
+        # MC tolerance: mean over 50 seeds at error_sd=0.05, ~200 units →
+        # SE-of-mean ~ 0.05/sqrt(50 * 200) ~ 5e-4. Tolerance 0.02 leaves
+        # margin for DGP design noise.
+        assert (
+            abs(mean_att - (-0.07)) < 0.02
+        ), f"non-staggered tau_total: expected -0.07, got {mean_att:.4f}"
+
+    def test_nonstaggered_recovers_delta_1(self):
+        delta_estimates = []
+        n_seeds = 50
+        for s in range(n_seeds):
+            df = generate_butts_nonstaggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s)
+            result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit(
+                df, outcome="y", unit="unit", time="time", treatment="D"
+            )
+            if result.spillover_effects is not None:
+                delta_estimates.append(result.spillover_effects.iloc[0]["coef"])
+        mean_delta = float(np.mean(delta_estimates))
+        assert (
+            abs(mean_delta - (-0.04)) < 0.02
+        ), f"non-staggered delta_1: expected -0.04, got {mean_delta:.4f}"
+
+    @pytest.mark.slow
+    def test_nonstaggered_recovers_at_200_seeds(self):
+        """Plan-targeted 200-seed MC. Marked slow; run via `pytest -m slow`."""
+        att_estimates = []
+        delta_estimates = []
+        for s in range(200):
+            df = generate_butts_nonstaggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s)
+            result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit(
+                df, outcome="y", unit="unit", time="time", treatment="D"
+            )
+            att_estimates.append(result.att)
+            if result.spillover_effects is not None:
+                delta_estimates.append(result.spillover_effects.iloc[0]["coef"])
+        assert abs(np.mean(att_estimates) - (-0.07)) < 0.02
+        assert abs(np.mean(delta_estimates) - (-0.04)) < 0.02
+
+    def test_staggered_recovers_tau_total(self):
+        """Staggered MC with 30 seeds (smaller because each DGP is larger)."""
+        att_estimates = []
+        for s in range(30):
+            df = generate_butts_staggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s)
+            result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit(
+                df, outcome="y", unit="unit", time="time", first_treat="first_treat"
+            )
+            att_estimates.append(result.att)
+        mean_att = float(np.mean(att_estimates))
+        # Staggered MC is noisier; allow a looser tolerance.
+        assert (
+            abs(mean_att - (-0.07)) < 0.04
+        ), f"staggered tau_total: expected -0.07, got {mean_att:.4f}"
+
+
+# =============================================================================
+# Step 3 (continued): staggered smoke test
+# =============================================================================
+
+
+# =============================================================================
+# Step 7: Conley integration end-to-end
+# =============================================================================
+
+
+class TestSpilloverDiDWithConley:
+    """Step 7: vcov_type='conley' flows through stage 2 cleanly."""
+
+    def test_conley_fit_runs(self):
+        df = generate_butts_nonstaggered_dgp(seed=42)
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            vcov_type="conley",
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,  # cross-sectional only (2-period panel)
+        )
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        assert result.vcov_type == "conley"
+        assert result.conley_lag_cutoff == 0
+        assert np.isfinite(result.se)
+
+    def test_conley_se_differs_from_hc1(self):
+        """Conley SE differs from HC1 baseline (spatial correlation in errors)."""
+        df = generate_butts_nonstaggered_dgp(
+            seed=42, n_treated=20, n_near_control=80, n_far_control=100
+        )
+        est_hc1 = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            vcov_type="hc1",
+        )
+        result_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        est_conley = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            vcov_type="conley",
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+        )
+        result_conley = est_conley.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # ATT point estimate unchanged across vcov; SE may differ.
+        assert abs(result_hc1.att - result_conley.att) < 1e-10
+        # Conley SE may be larger or smaller than HC1 depending on spatial
+        # error correlation; just assert it's not identical.
+        # (Synthetic DGP has independent errors so they may be very close;
+        # use a loose tolerance — primarily a wiring test.)
+        assert np.isfinite(result_conley.se)
+
+
+# =============================================================================
+# Step 3 (continued): staggered smoke test
+# =============================================================================
+
+
+class TestSpilloverDiDStaggeredFit:
+    """Step 3: staggered timing produces sensible results."""
+
+    def test_staggered_fit_runs(self):
+        # 3 cohorts, 4 periods
+        rng = np.random.default_rng(0)
+        rows = []
+        cohort_onsets = {1: 1, 2: 2, "C": np.inf}
+        # 6 units per cohort placed near distinct centers
+        for cohort_id, onset in cohort_onsets.items():
+            center_lat = 0.0 if cohort_id == 1 else (10.0 if cohort_id == 2 else 50.0)
+            for i in range(6):
+                u = f"{cohort_id}_{i}"
+                lat = center_lat + rng.normal(0, 0.05)
+                lon = rng.normal(0, 0.05)
+                first_treat = float(onset) if onset != np.inf else np.inf
+                for t in range(4):
+                    rows.append(
+                        {
+                            "unit": u,
+                            "time": t,
+                            "lat": lat,
+                            "lon": lon,
+                            "first_treat": first_treat,
+                            "y": 1.0
+                            + 0.1 * t
+                            + (0.05 * (t >= first_treat) if np.isfinite(first_treat) else 0)
+                            + rng.normal(0, 0.05),
+                        }
+                    )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(
+            rings=[0.0, 50.0],  # ring covers 0-50 km; far cutoff at 50
+            conley_coords=("lat", "lon"),
+        )
+        result = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+        assert result.is_staggered is True
+        assert np.isfinite(result.att)
+
+
+# =============================================================================
+# Step 2: Timing conversion helpers
+# =============================================================================
+
+
+class TestExtractTreatmentOnsets:
+    """Tests for _extract_treatment_onsets."""
+
+    def test_canonical_finite_onsets(self):
+        df = pd.DataFrame(
+            {
+                "unit": ["A", "A", "B", "B", "C", "C"],
+                "first_treat": [1, 1, 2, 2, np.inf, np.inf],
+                "time": [0, 1, 0, 1, 0, 1],
+            }
+        )
+        onsets = _extract_treatment_onsets(df, "first_treat", "unit")
+        assert onsets == {"A": 1.0, "B": 2.0, "C": np.inf}
+
+    def test_zero_treated_as_never(self):
+        df = pd.DataFrame({"unit": ["A", "A"], "first_treat": [0, 0], "time": [0, 1]})
+        onsets = _extract_treatment_onsets(df, "first_treat", "unit")
+        assert onsets == {"A": np.inf}
+
+    def test_nan_treated_as_never(self):
+        df = pd.DataFrame({"unit": ["A", "A"], "first_treat": [np.nan, np.nan], "time": [0, 1]})
+        onsets = _extract_treatment_onsets(df, "first_treat", "unit")
+        assert onsets == {"A": np.inf}
+
+
+class TestConvertTreatmentToFirstTreat:
+    """Tests for _convert_treatment_to_first_treat."""
+
+    def test_basic_conversion(self):
+        # 2 units, 3 periods; A treated from t=1, B never-treated.
+        df = pd.DataFrame(
+            {
+                "unit": ["A"] * 3 + ["B"] * 3,
+                "time": [0, 1, 2, 0, 1, 2],
+                "D": [0, 1, 1, 0, 0, 0],
+            }
+        )
+        out, col = _convert_treatment_to_first_treat(df, "D", "time", "unit")
+        assert col == "_spillover_first_treat"
+        a_rows = out[out["unit"] == "A"]
+        b_rows = out[out["unit"] == "B"]
+        assert (a_rows["_spillover_first_treat"] == 1.0).all()
+        assert np.isinf(b_rows["_spillover_first_treat"]).all()
+
+    def test_no_treated_units_marks_all_inf(self):
+        df = pd.DataFrame({"unit": ["A", "A"], "time": [0, 1], "D": [0, 0]})
+        out, _ = _convert_treatment_to_first_treat(df, "D", "time", "unit")
+        assert np.isinf(out["_spillover_first_treat"]).all()
+
+    def test_missing_treatment_column_raises(self):
+        df = pd.DataFrame({"unit": ["A"], "time": [0]})
+        with pytest.raises(ValueError, match="not in data"):
+            _convert_treatment_to_first_treat(df, "D", "time", "unit")
+
+    def test_non_binary_treatment_raises(self):
+        df = pd.DataFrame({"unit": ["A", "A"], "time": [0, 1], "D": [0, 2]})
+        with pytest.raises(ValueError, match="exact 0/1"):
+            _convert_treatment_to_first_treat(df, "D", "time", "unit")
+
+    def test_caller_dataframe_unchanged(self):
+        df = pd.DataFrame({"unit": ["A", "A"], "time": [0, 1], "D": [0, 1]})
+        original_cols = list(df.columns)
+        _convert_treatment_to_first_treat(df, "D", "time", "unit")
+        # The defensive copy + column add does NOT leak back to caller.
+        assert list(df.columns) == original_cols
+
+
+# =============================================================================
+# Step 2: SpilloverDiD validators
+# =============================================================================
+
+
+@pytest.fixture
+def simple_panel():
+    """Minimal valid 2-period panel for validator tests."""
+    return pd.DataFrame(
+        {
+            "unit": ["A", "A", "B", "B", "C", "C", "D", "D"],
+            "time": [0, 1] * 4,
+            "lat": [0.0, 0.0, 0.1, 0.1, 5.0, 5.0, 5.1, 5.1],
+            "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
+            "D": [0, 1, 0, 1, 0, 0, 0, 0],
+            "first_treat": [1, 1, 1, 1, np.inf, np.inf, np.inf, np.inf],
+            "y": np.arange(8.0),
+        }
+    )
+
+
+class TestValidateSpilloverInputs:
+    """Behavioral validation of front-door checks."""
+
+    def test_valid_minimal_input_passes(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 200.0], conley_coords=("lat", "lon"))
+        # Should not raise.
+        est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+        assert est._effective_d_bar == 200.0
+
+    def test_rings_too_short_raises(self, simple_panel):
+        est = SpilloverDiD(rings=[100.0])
+        with pytest.raises(ValueError, match="at least 2 breakpoints"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_rings_non_sorted_raises(self, simple_panel):
+        est = SpilloverDiD(rings=[50.0, 100.0, 75.0])
+        with pytest.raises(ValueError, match="strictly increasing"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_rings_negative_raises(self, simple_panel):
+        est = SpilloverDiD(rings=[-10.0, 100.0])
+        with pytest.raises(ValueError, match="non-negative"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_d_bar_mismatched_with_rings_raises(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 100.0, 200.0], d_bar=150.0)
+        with pytest.raises(ValueError, match="d_bar.*must equal"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_d_bar_equal_to_max_rings_accepted(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 100.0, 200.0], d_bar=200.0, conley_coords=("lat", "lon"))
+        est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+        assert est._effective_d_bar == 200.0
+
+    def test_d_bar_default_uses_max_rings(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 50.0, 175.0], conley_coords=("lat", "lon"))
+        est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+        assert est._effective_d_bar == 175.0
+
+    def test_treatment_and_first_treat_both_raise(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 200.0])
+        with pytest.raises(ValueError, match="either.*or"):
+            est._validate_spillover_inputs(simple_panel, "D", "first_treat", "time", "unit", "y")
+
+    def test_neither_treatment_nor_first_treat_raises(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 200.0])
+        with pytest.raises(ValueError, match="Exactly one of"):
+            est._validate_spillover_inputs(simple_panel, None, None, "time", "unit", "y")
+
+    def test_missing_required_column_raises(self, simple_panel):
+        est = SpilloverDiD(rings=[0.0, 200.0])
+        with pytest.raises(ValueError, match="Missing required columns"):
+            est._validate_spillover_inputs(
+                simple_panel, "D", None, "time", "nonexistent_unit_col", "y"
+            )
+
+    def test_conley_requires_coords(self, simple_panel):
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+        )
+        with pytest.raises(ValueError, match="conley_coords"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_conley_coords_must_be_2_tuple(self, simple_panel):
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_coords=("lat",),  # type: ignore[arg-type]  # only 1 element
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+        )
+        with pytest.raises(ValueError, match="2-tuple"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_conley_coord_column_missing_raises(self, simple_panel):
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_coords=("lat", "missing_lon"),
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+        )
+        with pytest.raises(ValueError, match="'missing_lon' not in data"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_conley_requires_positive_cutoff(self, simple_panel):
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_coords=("lat", "lon"),
+            conley_cutoff_km=-1.0,
+            conley_lag_cutoff=0,
+        )
+        with pytest.raises(ValueError, match="conley_cutoff_km"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_conley_requires_lag_cutoff(self, simple_panel):
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_coords=("lat", "lon"),
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=None,
+        )
+        with pytest.raises(ValueError, match="conley_lag_cutoff"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+    def test_nan_coords_raise(self, simple_panel):
+        df = simple_panel.copy()
+        df.loc[0, "lat"] = np.nan
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_coords=("lat", "lon"),
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+        )
+        with pytest.raises(ValueError, match="non-finite"):
+            est._validate_spillover_inputs(df, "D", None, "time", "unit", "y")
+
+    def test_no_treated_observations_raises(self, simple_panel):
+        df = simple_panel.copy()
+        df["D"] = 0
+        est = SpilloverDiD(rings=[0.0, 200.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="No treated observations"):
+            est._validate_spillover_inputs(df, "D", None, "time", "unit", "y")
+
+    def test_cluster_column_missing_raises(self, simple_panel):
+        est = SpilloverDiD(
+            rings=[0.0, 200.0],
+            vcov_type="conley",
+            conley_coords=("lat", "lon"),
+            conley_cutoff_km=200.0,
+            conley_lag_cutoff=0,
+            cluster="not_a_real_column",
+        )
+        with pytest.raises(ValueError, match="cluster column"):
+            est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y")
+
+
+class TestValidateFarAwayExists:
+    """Tests for SpilloverDiD._validate_far_away_exists."""
+
+    def test_returns_count_when_satisfied(self):
+        est = SpilloverDiD(rings=[0.0, 100.0])
+        est._effective_d_bar = 100.0
+        d = np.array([10.0, 50.0, 500.0, 1000.0])
+        is_control = np.array([False, True, True, True])
+        n = est._validate_far_away_exists(d, is_control)
+        assert n == 2
+
+    def test_raises_when_no_far_controls(self):
+        est = SpilloverDiD(rings=[0.0, 100.0])
+        est._effective_d_bar = 100.0
+        d = np.array([10.0, 50.0, 99.9, 100.0])
+        is_control = np.array([False, True, True, True])
+        with pytest.raises(ValueError, match="Assumption 5"):
+            est._validate_far_away_exists(d, is_control)
+
+    def test_raises_when_far_units_all_treated(self):
+        """Only treated units beyond d_bar (impossible in non-staggered, but the
+        validator's job is to check the population that identifies the
+        counterfactual: controls strictly past d_bar)."""
+        est = SpilloverDiD(rings=[0.0, 100.0])
+        est._effective_d_bar = 100.0
+        d = np.array([200.0, 300.0, 50.0])
+        is_control = np.array([False, False, True])  # only the close unit is control
+        with pytest.raises(ValueError, match="Assumption 5"):
+            est._validate_far_away_exists(d, is_control)
+
+
+# =============================================================================
+# Codex-review regression tests (post-Wave-B-MVP first review)
+# =============================================================================
+
+
+class TestSpilloverDiDCovariatesRejected:
+    """covariates= must raise NotImplementedError in Wave B MVP.
+
+    Stage-1 covariate residualization (Gardner-style) is not yet wired
+    through; appending raw covariates only at stage 2 silently biases
+    tau_total / delta_j on panels with time-varying covariates.
+    """
+
+    def test_covariates_raises_not_implemented(self):
+        df = _make_butts_2period_dgp(seed=42)
+        df["x"] = np.random.default_rng(0).normal(size=len(df))
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(NotImplementedError, match="covariates"):
+            est.fit(
+                df,
+                outcome="y",
+                unit="unit",
+                time="time",
+                treatment="D",
+                covariates=["x"],
+            )
+
+    def test_empty_covariates_accepted(self):
+        """An empty covariates list is the same as no covariates — should NOT raise."""
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        # Should not raise.
+        est.fit(df, outcome="y", unit="unit", time="time", treatment="D", covariates=[])
+
+
+class TestSpilloverDiDAbsorbingTreatmentValidation:
+    """Reversible / non-absorbing treatment patterns must raise."""
+
+    def test_reversible_treatment_path_raises(self):
+        # A unit's treatment goes [0, 1, 0] across 3 periods.
+        rows = []
+        rng = np.random.default_rng(0)
+        for u in ("treated_reversing", "ctrl_far"):
+            for t in range(3):
+                if u == "treated_reversing":
+                    d_val = 1 if t == 1 else 0
+                    lat, lon = 0.0, 0.0
+                else:
+                    d_val = 0
+                    lat, lon = 5.0, 0.0
+                rows.append(
+                    {
+                        "unit": u,
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "D": d_val,
+                        "y": rng.normal(),
+                    }
+                )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="non-absorbing"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_non_constant_first_treat_raises(self):
+        # Same unit has two different first_treat values across rows.
+        rows = []
+        for t in range(3):
+            rows.append(
+                {
+                    "unit": "u1",
+                    "time": t,
+                    "lat": 0.0,
+                    "lon": 0.0,
+                    "first_treat": 1.0 if t < 2 else 2.0,  # CHANGES at t=2
+                    "y": float(t),
+                }
+            )
+            rows.append(
+                {
+                    "unit": "u2_far",
+                    "time": t,
+                    "lat": 5.0,
+                    "lon": 0.0,
+                    "first_treat": np.inf,
+                    "y": float(t),
+                }
+            )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="non-constant"):
+            est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+
+class TestSpilloverDiDAllEventuallyTreated:
+    """All-eventually-treated staggered designs (no never-treated units)
+    should work as long as some not-yet-treated rows are far-away controls.
+    The Codex review flagged the prior is_ever_control logic as too strict.
+    """
+
+    def test_all_eventually_treated_staggered(self):
+        # Two cohorts, no never-treated units. Far cohort treats at t=10
+        # (well past the panel's t=3 horizon), so its early-period rows
+        # serve as far-away controls for the early cohort's treatment.
+        rng = np.random.default_rng(0)
+        # Per-unit coords sampled ONCE (within-unit constancy required).
+        early_coords = [(rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(8)]
+        late_coords = [(5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(8)]
+        rows = []
+        # Early cohort: treats at t=2, clustered at origin
+        for i, (lat, lon) in enumerate(early_coords):
+            u = f"early_{i}"
+            for t in range(4):
+                rows.append(
+                    {
+                        "unit": u,
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "first_treat": 2.0,
+                        "y": rng.normal() + 0.1 * t + (-0.07 if t >= 2 else 0.0),
+                    }
+                )
+        # Late cohort: treats at t=10 (far outside panel), placed FAR from early
+        for i, (lat, lon) in enumerate(late_coords):
+            u = f"late_{i}"
+            for t in range(4):
+                rows.append(
+                    {
+                        "unit": u,
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "first_treat": 10.0,  # never treats in panel
+                        "y": rng.normal() + 0.1 * t,
+                    }
+                )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+        assert np.isfinite(result.att)
+
+
+class TestSpilloverDiDConleyCoordsAlwaysRequired:
+    """conley_coords must be validated on every fit path, not just vcov_type=conley.
+    The Codex review noted the default-hc1 path was failing with AssertionError.
+    """
+
+    def test_missing_conley_coords_on_hc1_path_raises_value_error(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 100.0])  # no conley_coords, default hc1
+        with pytest.raises(ValueError, match="conley_coords"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDResultNObsMatchesEstimationSample:
+    """result.n_obs must equal the stage-2 estimation sample (after dropping
+    rows with non-finite y_tilde from rank-deficient stage-1 FE).
+    """
+
+    def test_n_obs_equals_finite_mask_count(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # In a well-conditioned DGP no rows are dropped, so n_obs = len(df).
+        assert result.n_obs == len(df)
+        # n_treated + n_control == n_obs (no overlap, no leakage).
+        assert result.n_treated + result.n_control == result.n_obs
+
+
+# =============================================================================
+# Codex review round-2 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDOmega0IdentificationCheck:
+    """Stage-1 FE support: every unit and every period in the panel must have
+    at least one Omega_0 = {D_it=0 AND S_it=0} row, otherwise FE is
+    unidentified for that unit/period and stage-2 estimates would be
+    silently dropped. Round-2 codex review wanted up-front rejection.
+    """
+
+    def test_unsupported_period_raises(self):
+        # Panel where t=1 has zero Omega_0 support: every t=1 unit is
+        # either treated (D=1) or near (S=1 since d_i <= d_bar=200).
+        # The far-aways only contribute at t=0 (no t=1 row).
+        rng = np.random.default_rng(0)
+        # Per-unit coords sampled ONCE (within-unit-constancy required).
+        treated_coords = [(rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(4)]
+        near_coords = [(rng.uniform(0.1, 0.5), rng.uniform(-0.3, 0.3)) for _ in range(4)]
+        far_coords = [(5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(4)]
+        rows = []
+        for i, (lat, lon) in enumerate(treated_coords):
+            for t in range(2):
+                rows.append(
+                    {
+                        "unit": f"T{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "D": int(t == 1),
+                        "y": rng.normal(),
+                    }
+                )
+        for i, (lat, lon) in enumerate(near_coords):
+            for t in range(2):
+                rows.append(
+                    {
+                        "unit": f"N{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "D": 0,
+                        "y": rng.normal(),
+                    }
+                )
+        # Far-aways at PRE only: no t=1 row → Omega_0 ∩ {t=1} is empty.
+        for i, (lat, lon) in enumerate(far_coords):
+            rows.append(
+                {
+                    "unit": f"F{i}",
+                    "time": 0,
+                    "lat": lat,
+                    "lon": lon,
+                    "D": 0,
+                    "y": rng.normal(),
+                }
+            )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 200.0], d_bar=200.0, conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="unidentified"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDConleyCoordsConstantPerUnit:
+    """conley_coords must be constant within each unit across rows.
+    Round-2 codex review noted ring construction collapses coords via
+    `drop_duplicates(subset=[unit])` — non-constant coords silently use
+    only the first row's location.
+    """
+
+    def test_time_varying_coords_raises(self):
+        rows = []
+        rng = np.random.default_rng(0)
+        # u1 has different lat at t=0 vs t=1
+        rows.append({"unit": "u1", "time": 0, "lat": 0.0, "lon": 0.0, "D": 0, "y": rng.normal()})
+        rows.append(
+            {
+                "unit": "u1",
+                "time": 1,
+                "lat": 1.5,  # changed!
+                "lon": 0.0,
+                "D": 1,
+                "y": rng.normal(),
+            }
+        )
+        # u2 is well-behaved (constant coords)
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": "u2",
+                    "time": t,
+                    "lat": 5.0,
+                    "lon": 0.0,
+                    "D": 0,
+                    "y": rng.normal(),
+                }
+            )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="non-constant"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+# =============================================================================
+# Codex review round-3 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDFractionalTreatmentRejected:
+    """Treatment column with fractional values (0.9, 1.1, etc.) must raise.
+    Round-3 codex review caught that `int(v) in (0, 1)` was rounding down
+    and silently misclassifying fractional rows.
+    """
+
+    @pytest.mark.parametrize("bad_value", [0.5, 0.9, 1.1, -0.1, 2.5])
+    def test_fractional_treatment_raises(self, bad_value):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        # Cast D to float64 before assigning the fractional value. Modern
+        # pandas (3.x) raises TypeError on int64-column fractional setitem
+        # BEFORE SpilloverDiD.fit() ever sees the input, so we promote the
+        # column dtype first to ensure the fractional value actually
+        # reaches the validator we're testing.
+        df["D"] = df["D"].astype(float)
+        first_treated_idx = df.index[df["D"] == 1][0]
+        df.loc[first_treated_idx, "D"] = bad_value
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="exact 0/1"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDNoTreatedRowRaises:
+    """If all first_treat > max(time), D_it is all zeros after the
+    anticipation shift. Must raise a clear identification error rather
+    than crashing in solve_ols.
+    """
+
+    def test_future_only_first_treat_raises(self):
+        rng = np.random.default_rng(0)
+        # All units treat at t=10, but panel only spans t=0..2.
+        rows = []
+        for i in range(4):
+            lat, lon = rng.normal(0, 0.005), rng.normal(0, 0.005)
+            for t in range(3):
+                rows.append(
+                    {
+                        "unit": f"T{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "first_treat": 10.0,  # > max(time) = 2
+                        "y": rng.normal(),
+                    }
+                )
+        for i in range(4):
+            lat, lon = 5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005)
+            for t in range(3):
+                rows.append(
+                    {
+                        "unit": f"F{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "first_treat": np.inf,
+                        "y": rng.normal(),
+                    }
+                )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="No observation is treated"):
+            est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+
+class TestSpilloverDiDHaversineDomain:
+    """Haversine lat/lon range validation applies on EVERY vcov path
+    (not just vcov_type='conley'), because ring construction always
+    uses the configured metric. Round-3 codex review noted out-of-range
+    coords silently produced wrong ring assignment on hc1/cluster paths.
+    """
+
+    def test_out_of_range_lat_on_hc1_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        # Corrupt one row's lat to be > 90 (impossible geographic value).
+        df.loc[0, "lat"] = 95.0
+        # Force the constancy check to ignore this corruption: also corrupt
+        # the unit's other row to the same value (constant per unit).
+        unit_of_first = df.loc[0, "unit"]
+        df.loc[df["unit"] == unit_of_first, "lat"] = 95.0
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            vcov_type="hc1",  # NOT conley
+        )
+        with pytest.raises(ValueError, match=r"latitude.*\[-90, 90\]"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_out_of_range_lon_on_hc1_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        unit_of_first = df.loc[0, "unit"]
+        df.loc[df["unit"] == unit_of_first, "lon"] = 200.0  # > 180
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match=r"longitude.*\[-180, 180\]"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_euclidean_metric_skips_range_check(self):
+        """conley_metric='euclidean' must NOT enforce haversine ranges."""
+        df = _make_butts_2period_dgp(seed=42).copy()
+        # Coordinates 95.0 / 200.0 are valid Euclidean but invalid haversine.
+        df.loc[df["unit"] == df.loc[0, "unit"], "lat"] = 95.0
+        df.loc[df["unit"] == df.loc[0, "unit"], "lon"] = 200.0
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="euclidean",
+        )
+        # Should not raise. (May still fail downstream for other reasons —
+        # we just need to confirm the haversine range gate is metric-aware.)
+        try:
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        except ValueError as exc:
+            # Acceptable: a different (non-domain) error from downstream.
+            assert "[-90, 90]" not in str(exc) and "[-180, 180]" not in str(exc)
+
+
+# =============================================================================
+# Codex review round-4 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDZeroIndexedBaselineTreated:
+    """Auto-generated `_spillover_first_treat` (from binary `D`) must NOT
+    collapse `0` onto the never-treated sentinel. Round-4 codex review
+    caught that `_extract_treatment_onsets` was collapsing 0 → inf for
+    EVERY first_treat input, silently reclassifying baseline-treated
+    units (D=1 at t=0) as never-treated.
+    """
+
+    def test_baseline_treated_unit_at_t0_recognized(self):
+        """A baseline-treated unit (D=1 at t=0) used to silently become
+        never-treated because `_convert_treatment_to_first_treat` wrote
+        first_treat=0 and `_extract_treatment_onsets` collapsed 0 -> inf.
+        After the fix, u1 is correctly recognized as treated. Because u1
+        has no Omega_0 rows (D=1 at all t), it triggers the round-16
+        warn-and-drop path: the warning naming `u1_baseline` PROVES it was
+        recognized as treated (the OLD bug silently reclassified u1 to
+        "never treated", which would have passed the Omega_0 check
+        without warning and produced garbage estimates).
+        """
+        rng = np.random.default_rng(0)
+        rows = []
+        # u1: baseline-treated (D=1 at all t). No Omega_0 rows → warned-
+        # and-dropped from stage 2.
+        # u2: treated from t=1 (D=0 at t=0, D=1 at t=1). Far from u1.
+        # u3: untreated far-control. Provides Omega_0 support.
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": "u1_baseline",
+                    "time": t,
+                    "lat": 0.0,
+                    "lon": 0.0,
+                    "D": 1,
+                    "y": rng.normal(),
+                }
+            )
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": "u2_treated_t1",
+                    "time": t,
+                    "lat": 10.0,
+                    "lon": 0.0,
+                    "D": int(t == 1),
+                    "y": rng.normal(),
+                }
+            )
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": "u3_far_control",
+                    "time": t,
+                    "lat": 20.0,
+                    "lon": 0.0,
+                    "D": 0,
+                    "y": rng.normal(),
+                }
+            )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        # PROOF u1 is recognized as treated: the warning names u1_baseline.
+        with pytest.warns(UserWarning, match="u1_baseline"):
+            result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # u1's 2 rows excluded from stage 2; u2 (1 treated row at t=1) and
+        # u3 (2 control rows) remain. Verify n_treated reflects only the
+        # supported sample.
+        assert result.n_treated == 1, (
+            f"After warn-drop of u1_baseline, expected 1 treated row "
+            f"(u2 at t=1); got n_treated={result.n_treated}"
+        )
+
+    def test_partial_unsupported_units_warn_and_drop(self):
+        """Round-16 codex review: units with no Omega_0 row should be
+        warned-and-dropped (matching TwoStageDiD's always-treated convention),
+        not block the full fit. The remaining supported sample fits normally.
+        """
+        rng = np.random.default_rng(1)
+        rows = []
+        # 4 baseline-treated units (no Omega_0 rows → all 4 warned-dropped).
+        for k in range(4):
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"baseline_{k}",
+                        "time": t,
+                        "lat": 0.0 + k * 0.001,
+                        "lon": 0.0,
+                        "D": 1,
+                        "y": rng.normal(),
+                    }
+                )
+        # 3 validly-treated units (treated from t=1; far from baselines).
+        for k in range(3):
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"treated_t1_{k}",
+                        "time": t,
+                        "lat": 10.0 + k * 0.01,
+                        "lon": 0.0,
+                        "D": int(t == 1),
+                        "y": rng.normal(),
+                    }
+                )
+        # 5 far-controls (full Omega_0 support).
+        for k in range(5):
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"far_control_{k}",
+                        "time": t,
+                        "lat": 20.0 + k * 0.01,
+                        "lon": 0.0,
+                        "D": 0,
+                        "y": rng.normal(),
+                    }
+                )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.warns(UserWarning, match="4 unit\\(s\\) have NO"):
+            result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # 4 baselines × 2 periods = 8 rows excluded. Remaining: 3 treated +
+        # 5 controls = 8 units × 2 periods = 16 rows. n_treated = 3 (one
+        # per treated unit at t=1).
+        assert result.n_treated == 3
+        assert result.n_obs == 16
+
+    def test_unsupported_period_still_raises(self):
+        """Period-level Omega_0 unsupport remains a hard error (round-16
+        codex split): dropping a period would remove all units' rows at
+        that t, losing the cross-time identification entirely.
+        """
+        rng = np.random.default_rng(2)
+        # Balanced panel where t=1 has NO Omega_0 rows: every unit at t=1
+        # is either treated or near a treated unit.
+        rows = []
+        # 2 treated units at t=1; 2 near-controls (within d_bar of treated)
+        # at both t. No far-controls → no Omega_0 row at t=1.
+        for k in range(2):
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"T{k}",
+                        "time": t,
+                        "lat": 0.0 + k * 0.001,
+                        "lon": 0.0,
+                        "D": int(t == 1),
+                        "y": rng.normal(),
+                    }
+                )
+        # Near-controls at both periods. Pre: untreated and (no current
+        # treatment) → S=0 → Omega_0. Post: untreated but treated nearby →
+        # S=1 → NOT Omega_0. So t=1 has no Omega_0 row.
+        for k in range(2):
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"N{k}",
+                        "time": t,
+                        "lat": 0.1 + k * 0.001,
+                        "lon": 0.0,
+                        "D": 0,
+                        "y": rng.normal(),
+                    }
+                )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], d_bar=100.0, conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="period.*unidentified|unidentified.*period"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_baseline_treated_with_far_control_pretreatment_works(self):
+        """A control unit at t<0 (pre-treatment for the baseline-treated)
+        provides the missing Omega_0 support, but baseline-treated units
+        still have no untreated rows. Confirm the system can FIT when
+        there's a clean control population, while still recognizing
+        baseline-treated units as treated."""
+        rng = np.random.default_rng(0)
+        rows = []
+        # u1 is treated from t=1 (NOT baseline) — gives Omega_0 support.
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": "u1_t1_treated",
+                    "time": t,
+                    "lat": 0.0,
+                    "lon": 0.0,
+                    "D": int(t == 1),
+                    "y": rng.normal(),
+                }
+            )
+        for t in (0, 1):
+            rows.append(
+                {
+                    "unit": "u2_far",
+                    "time": t,
+                    "lat": 5.0,
+                    "lon": 0.0,
+                    "D": 0,
+                    "y": rng.normal(),
+                }
+            )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # n_treated should be 1 (u1 at t=1 only); n_control should be 3
+        # (u1's t=0 row + u2's both rows). Critical: u1 is NOT silently
+        # reclassified.
+        assert result.n_treated == 1
+        assert result.n_control == 3
+
+
+class TestSpilloverDiDCallableSelfDistance:
+    """Callable metrics on the ring-construction path must satisfy the
+    same self-distance / symmetry contract as conley's vcov path.
+    Round-4 codex review noted positive self-distance silently corrupted
+    ring assignment on hc1/cluster fits.
+    """
+
+    def test_positive_self_distance_callable_raises(self):
+        df = _make_butts_2period_dgp(seed=42)
+
+        def bad_metric(a, b):
+            # Returns CONSTANT 7.5 — fails the zero-diagonal check on the
+            # (n, n) self-call validation.
+            return np.full((a.shape[0], b.shape[0]), 7.5)
+
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric=bad_metric,
+        )
+        with pytest.raises(ValueError, match="diagonal"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+# =============================================================================
+# Codex review round-5 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDAnticipationPropagation:
+    """anticipation is in `__init__` / `get_params` — round-5 codex review
+    flagged that it wasn't surfaced on the SpilloverDiDResults / to_dict()
+    so downstream consumers couldn't reconstruct the fitted estimand.
+    """
+
+    def test_anticipation_round_trips_to_result_and_dict(self):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            anticipation=0,
+        )
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # anticipation is on the result object
+        assert result.anticipation == 0
+        # anticipation is in to_dict()
+        d = result.to_dict()
+        assert "anticipation" in d
+        assert d["anticipation"] == 0
+
+
+class TestSpilloverDiDEffectiveRankDoF:
+    """Stage-2 residual df should use effective rank (after solve_ols drops
+    rank-deficient columns), not raw column count. Round-5 codex review
+    noted that using raw `X_2_fit.shape[1]` understates df_resid on
+    rank-deficient stage-2 fits and silently inflates p-values / CI widths.
+    """
+
+    def test_rank_deficient_design_uses_effective_rank(self):
+        # Construct a panel where the INNER ring [0, 50) has no controls
+        # so its stage-2 column is identically zero and solve_ols drops
+        # it. After the fix, df_resid uses the effective rank (k=2:
+        # treatment + outer ring [50, 200]), not the raw 3-column count.
+        rng = np.random.default_rng(42)
+        rows = []
+        # 8 treated near origin
+        for i in range(8):
+            lat, lon = rng.normal(0, 0.005), rng.normal(0, 0.005)
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"T{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "D": int(t == 1),
+                        "y": rng.normal(),
+                    }
+                )
+        # 20 NEAR-controls in the OUTER ring [50, 200) only — at ~1.2°
+        # ≈ 133 km from origin (inside outer ring, outside inner ring).
+        for i in range(20):
+            lat, lon = 1.2 + rng.normal(0, 0.005), rng.normal(0, 0.005)
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"N{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "D": 0,
+                        "y": rng.normal(),
+                    }
+                )
+        # 20 far-controls beyond d_bar=200 → identify the counterfactual.
+        for i in range(20):
+            lat, lon = 5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005)
+            for t in (0, 1):
+                rows.append(
+                    {
+                        "unit": f"F{i}",
+                        "time": t,
+                        "lat": lat,
+                        "lon": lon,
+                        "D": 0,
+                        "y": rng.normal(),
+                    }
+                )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 50.0, 200.0], conley_coords=("lat", "lon"))
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # Sanity: outer-ring coef + ATT both finite.
+        assert np.isfinite(result.att)
+        assert np.isfinite(result.spillover_effects.loc["[50, 200]", "coef"])
+        # The empty inner ring's coef should be NaN (dropped by solve_ols).
+        assert np.isnan(result.spillover_effects.loc["[0, 50)", "coef"])
+
+
+# =============================================================================
+# Codex review round-6 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDStringCodedTimeOnTreatmentPath:
+    """The `treatment=` path must coerce `time` to numeric BEFORE running
+    `_convert_treatment_to_first_treat`. Round-6 codex review noted that
+    string-coded numeric periods like ['0', '2', '10'] would sort
+    lexicographically ('10' < '2') and produce the wrong onset.
+    """
+
+    def test_string_coded_time_treatment_path(self):
+        # Unit u1: treated starting at time "2" (the SECOND period when
+        # sorted numerically). Lexicographic sort would mis-order "10" < "2"
+        # and assign first_treat = "10" (the alphabetic min of treated rows).
+        rng = np.random.default_rng(42)
+        rows = []
+        # u1: time periods "0", "2", "10" — treated at "2" and "10"
+        for t_str, t_num in [("0", 0), ("2", 2), ("10", 10)]:
+            rows.append(
+                {
+                    "unit": "u1",
+                    "time": t_str,
+                    "lat": 0.0,
+                    "lon": 0.0,
+                    "D": 1 if t_num >= 2 else 0,
+                    "y": rng.normal(),
+                }
+            )
+        # u2: far-away never-treated
+        for t_str in ("0", "2", "10"):
+            rows.append(
+                {
+                    "unit": "u2",
+                    "time": t_str,
+                    "lat": 5.0,
+                    "lon": 0.0,
+                    "D": 0,
+                    "y": rng.normal(),
+                }
+            )
+        df = pd.DataFrame(rows)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        # The bug previously was: lex sort would treat "10" as the smallest
+        # treated time, assigning u1 first_treat = "10" / 10.0 — outside
+        # the relevant comparison range. With the fix, first_treat = 2
+        # (numeric), so D_it = 1 at the rows with time = "2" AND "10" → 2
+        # treated rows.
+        result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # PROOF that time was coerced to numeric BEFORE onset detection:
+        # u1 has D=1 at numeric time=2 and time=10 (sorted numerically);
+        # n_treated must therefore be 2. Under the lex-sort bug, u1's onset
+        # would be "10" (since "10" < "2" lexicographically) and only the
+        # t="10" row would be flagged → n_treated = 1.
+        assert result.n_treated == 2, (
+            f"string-coded numeric time not coerced before onset detection: "
+            f"expected n_treated=2 (numeric sort, t in {{2, 10}}), got "
+            f"{result.n_treated}"
+        )
+
+
+class TestSpilloverDiDOutcomeColumnRequired:
+    """`outcome` should fail front-door with a ValueError, not late KeyError.
+    Round-6 codex review noted the validator skipped outcome.
+    """
+
+    def test_missing_outcome_column_raises_value_error(self):
+        df = _make_butts_2period_dgp(seed=42).drop(columns=["y"])
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="Missing required columns"):
+            est.fit(
+                df,
+                outcome="missing_outcome",
+                unit="unit",
+                time="time",
+                treatment="D",
+            )
+
+
+# =============================================================================
+# Codex review round-7 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDPanelStructure:
+    """Reject duplicate (unit, time) cells AND unbalanced panels up front.
+    Round-7 codex review reproduced an identification failure on an
+    unbalanced panel where moving a far-away outcome silently shifted
+    ATT by ~100x.
+    """
+
+    def test_duplicate_unit_time_cell_raises(self):
+        # u1 has two rows at the same period — duplicate (unit, time) cell.
+        df = pd.DataFrame(
+            {
+                "unit": ["u1", "u1", "u1", "u2", "u2", "u2"],
+                "time": [0, 0, 1, 0, 1, 1],  # duplicates at (u1, 0) and (u2, 1)
+                "lat": [0.0, 0.0, 0.0, 5.0, 5.0, 5.0],
+                "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
+                "D": [0, 0, 1, 0, 0, 0],
+                "y": [1.0, 1.0, 2.0, 0.5, 0.6, 0.6],
+            }
+        )
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="duplicate.*unit, time"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_unbalanced_panel_raises(self):
+        # u1 has 2 periods but u2 has only 1 → unbalanced panel.
+        df = pd.DataFrame(
+            {
+                "unit": ["u1", "u1", "u2"],
+                "time": [0, 1, 0],
+                "lat": [0.0, 0.0, 5.0],
+                "lon": [0.0, 0.0, 0.0],
+                "D": [0, 1, 0],
+                "y": [1.0, 2.0, 0.5],
+            }
+        )
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="Unbalanced panel"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+# =============================================================================
+# Codex review round-8 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDRingsStartAtZero:
+    """rings[0] must equal 0; otherwise units in 0 <= d_it < rings[0] are
+    flagged exposed but get zero spillover regressors → silent bias.
+    """
+
+    def test_rings_starting_above_zero_raises(self):
+        est = SpilloverDiD(rings=[10.0, 50.0, 100.0], conley_coords=("lat", "lon"))
+        df = _make_butts_2period_dgp(seed=42)
+        with pytest.raises(ValueError, match="rings\\[0\\] must equal 0"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDHC2NotSupported:
+    """vcov_type='hc2' and 'hc2_bm' require per-coefficient BM/CR2 DOF
+    that the inline stage-2 inference doesn't provide. Round-8 codex
+    review caught that we'd silently return wrong p-values/CIs.
+    """
+
+    @pytest.mark.parametrize("vcov_type", ["hc2", "hc2_bm"])
+    def test_hc2_paths_raise_not_implemented(self, vcov_type):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"), vcov_type=vcov_type)
+        with pytest.raises(NotImplementedError, match="hc2"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDRankDeficientActionValidation:
+    """rank_deficient_action must be one of {warn, error, silent}.
+    Mirrors the sibling constructor guards at two_stage.py:149 and
+    stacked_did.py.
+    """
+
+    @pytest.mark.parametrize("bad_value", ["raise", "ignore", "", "WARN", "Error", None])
+    def test_invalid_rank_deficient_action_raises_at_init(self, bad_value):
+        with pytest.raises(ValueError, match="rank_deficient_action must be"):
+            SpilloverDiD(
+                rings=[0.0, 100.0],
+                conley_coords=("lat", "lon"),
+                rank_deficient_action=bad_value,
+            )
+
+    @pytest.mark.parametrize("good_value", ["warn", "error", "silent"])
+    def test_valid_rank_deficient_action_accepted(self, good_value):
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            rank_deficient_action=good_value,
+        )
+        assert est.rank_deficient_action == good_value
+
+
+class TestSpilloverDiDOmega0Connectivity:
+    """Round-5 CI review P1: balanced panel + per-unit/per-period Omega_0
+    coverage is NECESSARY but not SUFFICIENT for stage-1 FE
+    identification — the Omega_0 bipartite graph must also be CONNECTED.
+    If it splits into K > 1 components, the iterative FE solver returns
+    FE only up to component-specific constants, and residualization
+    combines mu_i from one component with lambda_t from another,
+    silently corrupting tau_total / delta_j.
+
+    The check fires on the SUPPORTED-units subgraph (after unit-level
+    warn-and-drop). Under the current absorbing-treatment + period-strict
+    + unit-warn-drop regime the disconnected case may be unreachable in
+    practice via a real DGP — these tests unit-test the helper directly
+    with synthetic (unit_codes, time_codes, omega_0_mask) arrays so the
+    check is exercised even if no DGP can reach it through the public
+    `.fit()` path.
+    """
+
+    def test_disconnected_two_components_raises(self):
+        """Two units in periods {0, 1}, two more in periods {2, 3}; no
+        unit appears in both halves. Connectivity check must fail.
+        """
+        # 4 supported units (codes 0..3), 4 periods (codes 0..3).
+        # Omega_0 rows: (u0, t0), (u0, t1), (u1, t0), (u1, t1),
+        #               (u2, t2), (u2, t3), (u3, t2), (u3, t3).
+        unit_codes_arr = np.array([0, 0, 1, 1, 2, 2, 3, 3, 0, 1, 2, 3, 0, 1, 2, 3])
+        time_codes_arr = np.array([0, 1, 0, 1, 2, 3, 2, 3, 2, 2, 0, 0, 3, 3, 1, 1])
+        omega_0_mask = np.array(
+            [True, True, True, True, True, True, True, True]
+            + [False, False, False, False, False, False, False, False]
+        )
+        with pytest.raises(
+            ValueError, match="disconnected components|Stage-1 fixed effects unidentified"
+        ):
+            _check_omega_0_connectivity(
+                omega_0_mask=omega_0_mask,
+                unit_codes_arr=unit_codes_arr,
+                time_codes_arr=time_codes_arr,
+                units_in_omega_0={0, 1, 2, 3},
+                n_times=4,
+                unit_uniques=["u0", "u1", "u2", "u3"],
+            )
+
+    def test_connected_via_bridge_unit_succeeds(self):
+        """Add a single bridge unit that has Omega_0 rows in all periods —
+        the graph becomes connected and the check must pass.
+        """
+        unit_codes_arr = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4])
+        time_codes_arr = np.array([0, 1, 0, 1, 2, 3, 2, 3, 0, 1, 2, 3])
+        omega_0_mask = np.array([True] * 12)
+        # u4 spans all 4 periods, connecting the two halves through itself.
+        _check_omega_0_connectivity(
+            omega_0_mask=omega_0_mask,
+            unit_codes_arr=unit_codes_arr,
+            time_codes_arr=time_codes_arr,
+            units_in_omega_0={0, 1, 2, 3, 4},
+            n_times=4,
+            unit_uniques=["u0", "u1", "u2", "u3", "u4"],
+        )  # must not raise
+
+    def test_single_supported_unit_short_circuits(self):
+        """n_supp <= 1 short-circuits — no multi-component case possible."""
+        unit_codes_arr = np.array([0, 0])
+        time_codes_arr = np.array([0, 1])
+        omega_0_mask = np.array([True, True])
+        _check_omega_0_connectivity(
+            omega_0_mask=omega_0_mask,
+            unit_codes_arr=unit_codes_arr,
+            time_codes_arr=time_codes_arr,
+            units_in_omega_0={0},
+            n_times=2,
+            unit_uniques=["u0"],
+        )  # must not raise
+
+    def test_three_components_error_message_names_units(self):
+        """Error message should name first few units per component for
+        actionable debugging.
+        """
+        # 3 units, 3 periods, 3-way disconnection: (u0, t0), (u1, t1), (u2, t2).
+        unit_codes_arr = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
+        time_codes_arr = np.array([0, 1, 2, 1, 2, 0, 2, 0, 1])
+        omega_0_mask = np.array([True, True, True] + [False] * 6)
+        with pytest.raises(ValueError) as exc_info:
+            _check_omega_0_connectivity(
+                omega_0_mask=omega_0_mask,
+                unit_codes_arr=unit_codes_arr,
+                time_codes_arr=time_codes_arr,
+                units_in_omega_0={0, 1, 2},
+                n_times=3,
+                unit_uniques=["unit_A", "unit_B", "unit_C"],
+            )
+        msg = str(exc_info.value)
+        assert "3 disconnected components" in msg
+        assert "unit_A" in msg or "unit_B" in msg or "unit_C" in msg
+
+    def test_normal_butts_dgp_does_not_trigger(self):
+        """Positive case: a standard non-staggered Butts DGP must NOT
+        trigger the connectivity check.
+        """
+        from tests._dgp_utils import generate_butts_nonstaggered_dgp
+
+        df = generate_butts_nonstaggered_dgp(seed=0)
+        # Just verify .fit() succeeds — if connectivity check were
+        # over-eager, this would fail.
+        result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit(
+            df, outcome="y", unit="unit", time="time", treatment="D"
+        )
+        assert result.att is not None
+
+
+# =============================================================================
+# Codex review round-9 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDAnticipationValidation:
+    """anticipation must be a non-negative integer. Round-9 codex review
+    caught that fractional / negative values silently shifted timing.
+    """
+
+    @pytest.mark.parametrize("bad_value", [-1, 0.5, 1.5, -0.1])
+    def test_invalid_anticipation_raises_treatment_path(self, bad_value):
+        df = _make_butts_2period_dgp(seed=42)
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            anticipation=bad_value,
+        )
+        with pytest.raises(ValueError, match="anticipation"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDNonFiniteOutcomeRejected:
+    """outcome must be finite per-row; NaN/Inf raise a targeted ValueError."""
+
+    def test_nan_outcome_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df.loc[0, "y"] = np.nan
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="non-finite"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_inf_outcome_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df.loc[0, "y"] = np.inf
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="non-finite"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+# =============================================================================
+# Codex review round-10 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDNaNTreatmentRejected:
+    """NaN in the binary treatment column must raise.
+    Round-10 codex review caught that `_convert_treatment_to_first_treat`
+    silently dropped NaN rows via `dropna()` before validation, then
+    rebuilt D_it from the inferred onset — coercing missing rows to
+    treated or control without warning.
+    """
+
+    @pytest.mark.parametrize(
+        "d_pattern",
+        [
+            [0, 1, float("nan")],
+            [float("nan"), 1, 1],
+            [0, float("nan"), 0],
+        ],
+    )
+    def test_nan_in_treatment_helper_raises(self, d_pattern):
+        df = pd.DataFrame(
+            {
+                "unit": ["u1"] * 3,
+                "time": [0, 1, 2],
+                "D": d_pattern,
+            }
+        )
+        with pytest.raises(ValueError, match="NaN"):
+            _convert_treatment_to_first_treat(df, "D", "time", "unit")
+
+    def test_nan_in_treatment_end_to_end_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        first_treated_idx = df.index[df["D"] == 1][0]
+        df.loc[first_treated_idx, "D"] = np.nan
+        # Convert column dtype to float so NaN is preserved.
+        df["D"] = df["D"].astype(float)
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="NaN"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+# =============================================================================
+# Codex review round-11 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDClusterNaNRejected:
+    """NaN in cluster column must raise. Round-11 codex review caught
+    that missing cluster ids silently changed SEs and overstated
+    n_clusters because np.unique counts NaN as its own cluster but
+    pandas groupby drops it from the cluster meat.
+    """
+
+    def test_numeric_nan_cluster_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df["region"] = 1.0
+        df.loc[df.index[0], "region"] = np.nan
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            cluster="region",
+        )
+        with pytest.raises(ValueError, match="cluster.*missing"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_object_nan_cluster_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df["region"] = "A"
+        df.loc[df.index[0], "region"] = None  # object-typed NaN
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            cluster="region",
+        )
+        with pytest.raises(ValueError, match="cluster.*missing"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDIdentifierNaNRejected:
+    """unit / time / first_treat columns must not contain NaN — round-11
+    codex review noted these fell through to opaque numpy / pandas
+    errors instead of targeted ValueErrors.
+    """
+
+    def test_nan_unit_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df.loc[df.index[0], "unit"] = None
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="identifier column 'unit'"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_nan_time_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df["time"] = df["time"].astype(float)
+        df.loc[df.index[0], "time"] = np.nan
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="identifier column 'time'"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_nan_first_treat_raises(self):
+        df = _make_butts_2period_dgp(seed=42).copy()
+        df.loc[df.index[0], "first_treat"] = np.nan
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="first_treat.*missing"):
+            est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+
+# =============================================================================
+# Codex review round-12 regression tests
+# =============================================================================
+
+
+class TestSpilloverDiDMixedRawTimeEncoding:
+    """Mixed encodings that collapse under pd.to_numeric must be caught
+    by the validator AFTER coercion. Round-12 codex review caught that
+    raw labels ['0', 0] (str + int) would pass duplicate-cell validation
+    on the raw labels then collapse to (0, 0) after pd.to_numeric, with
+    no warning.
+    """
+
+    def test_mixed_str_and_int_time_collapse_caught_as_duplicate(self):
+        # u1 has time entries '0' (str) and 0 (int). They collapse to 0
+        # under pd.to_numeric → duplicate (u1, 0) cell.
+        df = pd.DataFrame(
+            {
+                "unit": ["u1", "u1", "u1", "u2", "u2", "u2"],
+                "time": ["0", 0, 1, 0, 0, 1],
+                "lat": [0.0, 0.0, 0.0, 5.0, 5.0, 5.0],
+                "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
+                "D": [0, 0, 1, 0, 0, 0],
+                "y": [1.0, 1.0, 2.0, 0.5, 0.6, 0.6],
+            }
+        )
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="duplicate.*unit, time"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_leading_zero_string_time_collapse_caught(self):
+        # '01' and 1 collapse under pd.to_numeric to (1, 1) → duplicate.
+        df = pd.DataFrame(
+            {
+                "unit": ["u1", "u1", "u1", "u2", "u2", "u2"],
+                "time": [0, "01", 1, 0, "01", 1],
+                "lat": [0.0, 0.0, 0.0, 5.0, 5.0, 5.0],
+                "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
+                "D": [0, 0, 1, 0, 0, 0],
+                "y": [1.0, 1.0, 2.0, 0.5, 0.6, 0.6],
+            }
+        )
+        est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))
+        with pytest.raises(ValueError, match="duplicate.*unit, time"):
+            est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDNonStaggeredFEEquivalence:
+    """Round-14 codex review (P2): pin the Gardner identity empirically.
+
+    Under non-staggered timing, two-stage Gardner with the Omega_0-restricted
+    stage 1 should produce the same `tau_total` as a single-stage TWFE ring
+    regression on the full sample with the time-varying ring covariate. This
+    is Butts Eqs. 4-6 / Proposition 2.3 (non-staggered identification).
+
+    The Omega_0 restriction would BREAK Gardner identity in general (stage 1
+    estimates FE on a subset, predicts onto observations outside the training
+    set), but on Butts-Assumption-satisfying DGPs it is empirically innocent
+    at floating-point precision. This test PINS that equivalence so any
+    methodology drift surfaces in CI rather than as a silent estimand shift.
+
+    Codex round 14 reported wildly divergent values (seed 3: +0.0238 from
+    SpilloverDiD vs -0.0735 from FE) but those numbers were unreproducible —
+    our 20-seed sweep confirms bit-identity at atol=1e-10.
+    """
+
+    @staticmethod
+    def _fit_butts_single_stage_fe_ring(
+        df, *, outcome, unit, time, treatment, rings, lat="lat", lon="lon"
+    ):
+        """Reference: single-stage TWFE ring regression on full sample.
+
+        Y_it = mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}
+
+        For non-staggered with shared onset t_treat, Ring_{it,j} = 1{d_i in
+        [rings_j, rings_{j+1})} * 1{t >= t_treat}. Uses library's solve_ols
+        for rank-deficient-safe pseudo-inverse.
+        """
+        import math
+        import warnings
+
+        from diff_diff.linalg import solve_ols
+
+        rings = sorted(rings)
+        df = df.copy()
+        units = sorted(df[unit].unique())
+        times = sorted(df[time].unique())
+        unit_idx = {u: i for i, u in enumerate(units)}
+        time_idx = {t: i for i, t in enumerate(times)}
+        n = len(df)
+
+        treated_set = set(df.loc[df[treatment] == 1, unit].unique())
+        lat_map = df.groupby(unit)[lat].first().to_dict()
+        lon_map = df.groupby(unit)[lon].first().to_dict()
+
+        def hav(u1, u2):
+            lat1, lon1 = math.radians(lat_map[u1]), math.radians(lon_map[u1])
+            lat2, lon2 = math.radians(lat_map[u2]), math.radians(lon_map[u2])
+            dlat = lat2 - lat1
+            dlon = lon2 - lon1
+            a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
+            return 2 * 6371.0 * math.asin(math.sqrt(a))
+
+        d_i = {}
+        for u in units:
+            if u in treated_set:
+                d_i[u] = 0.0
+            else:
+                d_i[u] = min(hav(u, tt) for tt in treated_set)
+
+        K = len(rings) - 1
+        ring_of_unit = {}
+        for u in units:
+            d = d_i[u]
+            ring_of_unit[u] = -1
+            for j in range(K):
+                if rings[j] <= d < rings[j + 1]:
+                    ring_of_unit[u] = j
+                    break
+
+        t_treat = df.loc[df[treatment] == 1, time].min()
+
+        n_u = len(units)
+        n_t = len(times)
+        n_reg = 1 + K
+        # Intercept + (n_u - 1) unit FE dummies + (n_t - 1) time FE dummies + n_reg regressors
+        X = np.zeros((n, 1 + (n_u - 1) + (n_t - 1) + n_reg))
+        X[:, 0] = 1.0
+        y = df[outcome].values.astype(float)
+
+        for i, row in enumerate(df.itertuples(index=False)):
+            u = getattr(row, unit)
+            t = getattr(row, time)
+            D = getattr(row, treatment)
+            if unit_idx[u] > 0:
+                X[i, 1 + unit_idx[u] - 1] = 1.0
+            if time_idx[t] > 0:
+                X[i, 1 + (n_u - 1) + time_idx[t] - 1] = 1.0
+            X[i, 1 + (n_u - 1) + (n_t - 1) + 0] = D
+            ridx = ring_of_unit[u]
+            if ridx >= 0 and t >= t_treat:
+                X[i, 1 + (n_u - 1) + (n_t - 1) + 1 + ridx] = 1 - D
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            beta, _, _ = solve_ols(X, y, vcov_type="hc1")
+        tau = beta[1 + (n_u - 1) + (n_t - 1) + 0]
+        return tau
+
+    def test_nonstaggered_one_ring_matches_single_stage_fe_20_seeds(self):
+        """20-seed bit-identity sweep, 1 ring (rings=[0, 200])."""
+        from tests._dgp_utils import generate_butts_nonstaggered_dgp
+
+        diffs = []
+        for seed in range(20):
+            df = generate_butts_nonstaggered_dgp(seed=seed, tau_total=-0.07, delta_1=-0.04)
+            est = SpilloverDiD(rings=[0, 200], d_bar=200.0, conley_coords=("lat", "lon"))
+            spill = est.fit(df, outcome="y", unit="unit", time="time", treatment="D").att
+            fe_tau = self._fit_butts_single_stage_fe_ring(
+                df,
+                outcome="y",
+                unit="unit",
+                time="time",
+                treatment="D",
+                rings=[0, 200],
+            )
+            diffs.append(abs(spill - fe_tau))
+        max_abs_diff = max(diffs)
+        assert max_abs_diff < 1e-10, (
+            f"Gardner identity broken: max |SpilloverDiD - single-stage FE| "
+            f"= {max_abs_diff:.6e} across 20 seeds (expected < 1e-10)"
+        )
+
+    def test_nonstaggered_multi_ring_matches_single_stage_fe_10_seeds(self):
+        """10-seed bit-identity sweep with multi-ring spec."""
+        from tests._dgp_utils import generate_butts_nonstaggered_dgp
+
+        # DGP places near-controls in d ≤ d_bar/2 = 100km, so rings beyond 100
+        # may be empty. Use rings=[0, 50, 200] which has near-controls in
+        # [0, 50) and possibly [50, 200).
+        diffs = []
+        for seed in range(10):
+            df = generate_butts_nonstaggered_dgp(seed=seed, tau_total=-0.07, delta_1=-0.04)
+            est = SpilloverDiD(rings=[0, 50, 200], d_bar=200.0, conley_coords=("lat", "lon"))
+            spill = est.fit(df, outcome="y", unit="unit", time="time", treatment="D").att
+            fe_tau = self._fit_butts_single_stage_fe_ring(
+                df,
+                outcome="y",
+                unit="unit",
+                time="time",
+                treatment="D",
+                rings=[0, 50, 200],
+            )
+            diffs.append(abs(spill - fe_tau))
+        max_abs_diff = max(diffs)
+        assert max_abs_diff < 1e-10, (
+            f"Multi-ring Gardner identity broken: max diff "
+            f"= {max_abs_diff:.6e} across 10 seeds (expected < 1e-10)"
+        )
+
+
+class TestSpilloverDiDCoefficientsAlignToVcov:
+    """Round-15 codex review (P2): `coefficients` must expose ALL stage-2
+    coefficients (treatment + K ring slots), not just `ATT`, so consumers
+    can align names to the `(1+K)×(1+K)` `vcov` rows/cols. The vcov
+    columns are `["treatment", "_spillover_<ring_label>", ...]`; the
+    coefficients dict mirrors those keys plus an `"ATT"` alias for the
+    treatment slot (sibling-estimator convention).
+    """
+
+    def _fit_one(self, rings):
+        from tests._dgp_utils import generate_butts_nonstaggered_dgp
+
+        df = generate_butts_nonstaggered_dgp(seed=42, tau_total=-0.07, delta_1=-0.04)
+        est = SpilloverDiD(rings=rings, d_bar=float(rings[-1]), conley_coords=("lat", "lon"))
+        return est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_coefficients_has_treatment_and_ring_keys(self):
+        result = self._fit_one([0, 50, 200])
+        assert "ATT" in result.coefficients
+        assert "treatment" in result.coefficients
+        ring_keys = [k for k in result.coefficients if k.startswith("_spillover_")]
+        assert (
+            len(ring_keys) == 2
+        ), f"Expected 2 ring coefficients, got {len(ring_keys)}: {ring_keys}"
+
+    def test_att_alias_equals_treatment_slot(self):
+        result = self._fit_one([0, 100])
+        assert result.coefficients["ATT"] == result.coefficients["treatment"]
+        assert result.coefficients["ATT"] == result.att
+
+    def test_coefficients_length_matches_vcov_dimension(self):
+        result = self._fit_one([0, 50, 200])
+        assert result.vcov.shape == (3, 3)
+        stage2_keys = [k for k in result.coefficients if k != "ATT"]
+        assert len(stage2_keys) == result.vcov.shape[0]
+
+    def test_ring_coefficients_match_spillover_effects_dataframe(self):
+        result = self._fit_one([0, 50, 200])
+        for ring_label, row in result.spillover_effects.iterrows():
+            key = f"_spillover_{ring_label}"
+            assert key in result.coefficients, f"Missing key {key} in coefficients"
+            assert result.coefficients[key] == row["coef"], (
+                f"Drift on {ring_label}: coefficients[{key}]="
+                f"{result.coefficients[key]} vs spillover_effects.coef={row['coef']}"
+            )

From 1a71eaf59506e2551d4ae19b7dc4ac557a1eb942 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 06:51:31 -0400
Subject: [PATCH 2/3] SpilloverDiD: add anticipation behavior tests + staggered
 delta_1 recovery

Two test-coverage gaps surfaced on the rebased SHA:

1. `anticipation` was previously only round-tripped on results and
   rejected for invalid values, with no behavior-level test asserting
   that it actually shifts D_it / Omega_0 membership / the ring-exposure
   clock. A regression there could silently move rows in/out of stage 1
   and change tau_total / delta_j without failing CI.
2. Staggered MC anchored only `tau_total`, not `delta_1`. A staggered-
   only ring-assignment or spillover-coefficient regression with roughly-
   unchanged tau_total wouldn't be caught.

Fixes:

- tests/test_spillover.py: new `TestSpilloverDiDAnticipationBehavior`
  class (3 tests) with a hand-built 4-period panel (1 treated @ t=2,
  1 near-control, 2 far-controls). Asserts on both `treatment=` and
  `first_treat=` paths that anticipation=1 (vs 0) yields n_treated+1,
  smaller stage1_n_obs, and a different att. Plus a parity check that
  the two fit paths produce identical results under the same anticipation
  setting (entry points are internally unified).
- tests/test_spillover.py:
  `TestSpilloverDiDIdentification::test_staggered_recovers_tau_total`
  renamed to `..._tau_total_and_delta_1` and extended to also assert
  staggered delta_1 recovery within 0.03 absolute tolerance (mean over
  30 seeds: -0.0398, std 0.0083). Per-event-time `delta_jk`
  decomposition on staggered DGPs is still queued alongside
  event_study=True support.
- docs/methodology/REGISTRY.md, CHANGELOG.md: narrow the staggered MC
  anchor language to match (tau_total + delta_1, not tau_total only);
  bump test count 153 -> 156.

All 156 tests pass; black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                 |   2 +-
 docs/methodology/REGISTRY.md |   2 +-
 tests/test_spillover.py      | 152 ++++++++++++++++++++++++++++++++++-
 3 files changed, 151 insertions(+), 5 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index b45f1ae1..07f8eab1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (153 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors `tau_total` only — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
+- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (156 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
 ### Changed
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 13995e77..a177962b 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2971,7 +2971,7 @@ where `D_it` is the treatment indicator (1 if unit `i` is treated by time `t`) a
 
 **Note (anticipation shift):** The `anticipation: int` constructor kwarg (default 0) shifts BOTH the treatment and ring-membership clocks by `-anticipation` so the stage-1 Omega_0 subsample correctly excludes the `anticipation` pre-treatment periods (which would otherwise contaminate the FE estimation with anticipation effects). Concretely, the effective treatment indicator becomes `D'_it = 1{t >= first_treat - anticipation}` and ring membership uses the corresponding shifted "currently-treated" set when computing `d_it` for staggered timing. Stage 2 then estimates `tau_total` and `delta_j` net of the anticipation window. Mirrors `TwoStageDiD`'s `anticipation` parameter semantics — recommended use is a small integer (1-3 periods) when domain knowledge suggests treatment effects begin before formal onset (e.g., policy announcements). Setting `anticipation > 0` AND providing a `first_treat` column where `first_treat - anticipation < min(time)` for any unit will mark that unit as treated from the panel's first observation, with the same baseline-treated handling as if it were always-treated (warn-and-drop via the Omega_0 unit-level check above).
 
-**Note:** No published R/Stata software exists for the two-period ring estimator (Butts Eqs. 5-6). Correctness anchored on (a) synthetic-DGP Monte Carlo identification tests on the **non-staggered** DGP (50-seed default + 200-seed `@pytest.mark.slow` variant — both recover `tau_total` AND `delta_1` within 0.02 absolute tolerance on Butts-Assumption-satisfying DGPs) plus a **staggered**-DGP MC test (30-seed, no slow variant) that anchors `tau_total` only within 0.04 absolute tolerance (looser bound because each staggered DGP is larger and noisier); per-ring `delta_jk` recovery on staggered DGPs and a 200-seed slow staggered variant are queued as follow-ups alongside `event_study=True` support; (b) reduce-to-TWFE limit (non-staggered, 20-seed deterministic bit-identity at `atol=1e-10` via `TestSpilloverDiDNonStaggeredFEEquivalence`); (c) Wave A's Conley sparse-vs-dense parity inherited via `solve_ols`. A reduce-to-`TwoStageDiD` limit was scoped during planning but not shipped in Wave B (the Omega_0 stricter subsample requires additional fixture work to align with `TwoStageDiD`'s `{D_it = 0}` subsample); queued as a follow-up alongside the Gardner GMM correction.
+**Note:** No published R/Stata software exists for the two-period ring estimator (Butts Eqs. 5-6). Correctness anchored on (a) synthetic-DGP Monte Carlo identification tests on the **non-staggered** DGP (50-seed default + 200-seed `@pytest.mark.slow` variant — both recover `tau_total` AND `delta_1` within 0.02 absolute tolerance on Butts-Assumption-satisfying DGPs) plus a **staggered**-DGP MC test (30-seed, no slow variant) that anchors both `tau_total` within 0.04 and `delta_1` within 0.03 absolute tolerance (looser bounds because each staggered DGP is larger and noisier); per-event-time `delta_jk` decomposition on staggered DGPs and a 200-seed slow staggered variant are queued as follow-ups alongside `event_study=True` support; (b) reduce-to-TWFE limit (non-staggered, 20-seed deterministic bit-identity at `atol=1e-10` via `TestSpilloverDiDNonStaggeredFEEquivalence`); (c) Wave A's Conley sparse-vs-dense parity inherited via `solve_ols`. A reduce-to-`TwoStageDiD` limit was scoped during planning but not shipped in Wave B (the Omega_0 stricter subsample requires additional fixture work to align with `TwoStageDiD`'s `{D_it = 0}` subsample); queued as a follow-up alongside the Gardner GMM correction.
 
 **Assumptions (Butts 2021):**
 
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index c3892942..42ed7f5f 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -810,20 +810,33 @@ def test_nonstaggered_recovers_at_200_seeds(self):
         assert abs(np.mean(att_estimates) - (-0.07)) < 0.02
         assert abs(np.mean(delta_estimates) - (-0.04)) < 0.02
 
-    def test_staggered_recovers_tau_total(self):
-        """Staggered MC with 30 seeds (smaller because each DGP is larger)."""
+    def test_staggered_recovers_tau_total_and_delta_1(self):
+        """Staggered MC with 30 seeds (smaller because each DGP is larger).
+
+        Anchors BOTH `tau_total` and `delta_1` recovery on the staggered
+        DGP. Per-ring `delta_jk` (event-time decomposition) is deferred
+        alongside `event_study=True` support.
+        """
         att_estimates = []
+        delta_estimates = []
         for s in range(30):
             df = generate_butts_staggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s)
             result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit(
                 df, outcome="y", unit="unit", time="time", first_treat="first_treat"
             )
             att_estimates.append(result.att)
+            if result.spillover_effects is not None and len(result.spillover_effects) > 0:
+                delta_estimates.append(result.spillover_effects.iloc[0]["coef"])
         mean_att = float(np.mean(att_estimates))
-        # Staggered MC is noisier; allow a looser tolerance.
+        mean_delta = float(np.mean(delta_estimates))
+        # Staggered MC is noisier than non-staggered; allow a looser
+        # tolerance (0.04 on tau_total, 0.03 on delta_1).
         assert (
             abs(mean_att - (-0.07)) < 0.04
         ), f"staggered tau_total: expected -0.07, got {mean_att:.4f}"
+        assert (
+            abs(mean_delta - (-0.04)) < 0.03
+        ), f"staggered delta_1: expected -0.04, got {mean_delta:.4f}"
 
 
 # =============================================================================
@@ -1855,6 +1868,139 @@ def test_anticipation_round_trips_to_result_and_dict(self):
         assert d["anticipation"] == 0
 
 
+class TestSpilloverDiDAnticipationBehavior:
+    """Round-7 CI review P1: anticipation must change the fitted estimand,
+    not just round-trip through the result object. It shifts BOTH the
+    treatment indicator and the ring-exposure clock by `-anticipation`,
+    moving rows in and out of Omega_0 and changing `tau_total` / `delta_j`.
+    Hand-built 4-period panel with one treated unit at t=2: anticipation=1
+    promotes t=1 into the "treated" window, dropping that row from
+    Omega_0 and increasing n_treated by one period's worth of obs.
+    Verified on both fit entry paths (`treatment=` and `first_treat=`).
+    """
+
+    @staticmethod
+    def _make_4period_panel():
+        rng = np.random.default_rng(42)
+        # 1 treated @ t=2, 1 near-control, 2 far-controls. 4 periods.
+        specs = [
+            ("treated", 0.0, [0, 0, 1, 1]),
+            ("near", 0.5, [0, 0, 0, 0]),
+            ("far1", 5.0, [0, 0, 0, 0]),
+            ("far2", 5.1, [0, 0, 0, 0]),
+        ]
+        rows = []
+        for unit, lat, d_pattern in specs:
+            for t in range(4):
+                rows.append(
+                    {
+                        "unit": unit,
+                        "time": t,
+                        "lat": lat,
+                        "lon": 0.0,
+                        "D": d_pattern[t],
+                        "y": rng.normal(),
+                    }
+                )
+        return pd.DataFrame(rows)
+
+    def test_anticipation_shifts_omega_0_on_treatment_path(self):
+        """anticipation=1 on the binary `treatment=` path: the effective
+        treatment indicator slides one period earlier, so t=1 (formerly
+        Omega_0 for treated + near units) is dropped from stage 1 and
+        promoted into the "currently-treated / currently-exposed" zone.
+        Stage 1 sample shrinks; n_treated grows; att changes."""
+        df = self._make_4period_panel()
+        r0 = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            anticipation=0,
+        ).fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        r1 = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            anticipation=1,
+        ).fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # n_treated grows by 1 row (treated unit's t=1 row promoted under
+        # the shifted indicator).
+        assert r1.n_treated == r0.n_treated + 1, (
+            f"anticipation=1 should add one treated period; "
+            f"got n_treated {r0.n_treated} -> {r1.n_treated}"
+        )
+        # Stage 1 Omega_0 sample shrinks (rows promoted to treated/exposed
+        # leave Omega_0).
+        assert r1.stage1_n_obs < r0.stage1_n_obs, (
+            f"anticipation=1 should shrink Omega_0; got stage1_n_obs "
+            f"{r0.stage1_n_obs} -> {r1.stage1_n_obs}"
+        )
+        # And the estimand changes (different sample → different att).
+        assert r0.att != r1.att, f"anticipation=1 should change att; got {r0.att} == {r1.att}"
+
+    def test_anticipation_shifts_omega_0_on_first_treat_path(self):
+        """anticipation=1 on the Gardner `first_treat=` path: same shift
+        applies. The first_treat column carries treatment onsets directly,
+        and anticipation subtracts from each onset for both the D_it
+        construction AND the ring-exposure (S_it) clock."""
+        df = self._make_4period_panel()
+        # Convert binary D to first_treat column.
+        first_treat_map = {"treated": 2.0, "near": np.inf, "far1": np.inf, "far2": np.inf}
+        df_ft = df.copy()
+        df_ft["first_treat"] = df_ft["unit"].map(first_treat_map)
+        df_ft = df_ft.drop(columns=["D"])
+
+        r0 = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            anticipation=0,
+        ).fit(df_ft, outcome="y", unit="unit", time="time", first_treat="first_treat")
+        r1 = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            anticipation=1,
+        ).fit(df_ft, outcome="y", unit="unit", time="time", first_treat="first_treat")
+        assert r1.n_treated == r0.n_treated + 1, (
+            f"first_treat path anticipation=1: expected +1 treated, "
+            f"got {r0.n_treated} -> {r1.n_treated}"
+        )
+        assert r1.stage1_n_obs < r0.stage1_n_obs, (
+            f"first_treat path anticipation=1: Omega_0 should shrink, "
+            f"got {r0.stage1_n_obs} -> {r1.stage1_n_obs}"
+        )
+        assert r0.att != r1.att, (
+            f"first_treat path anticipation=1: expected att to change, " f"got {r0.att} == {r1.att}"
+        )
+
+    def test_anticipation_shift_matches_across_fit_paths(self):
+        """Sanity-check that the `treatment=` and `first_treat=` paths
+        produce identical results under the same anticipation setting —
+        the two entry points are internally unified, so anticipation must
+        compose consistently with both."""
+        df = self._make_4period_panel()
+        df_ft = df.copy()
+        df_ft["first_treat"] = df_ft["unit"].map(
+            {"treated": 2.0, "near": np.inf, "far1": np.inf, "far2": np.inf}
+        )
+        df_ft = df_ft.drop(columns=["D"])
+
+        for ant in (0, 1):
+            r_d = SpilloverDiD(
+                rings=[0.0, 100.0],
+                conley_coords=("lat", "lon"),
+                anticipation=ant,
+            ).fit(df, outcome="y", unit="unit", time="time", treatment="D")
+            r_ft = SpilloverDiD(
+                rings=[0.0, 100.0],
+                conley_coords=("lat", "lon"),
+                anticipation=ant,
+            ).fit(df_ft, outcome="y", unit="unit", time="time", first_treat="first_treat")
+            assert r_d.att == r_ft.att, (
+                f"anticipation={ant}: att mismatch between paths " f"({r_d.att} vs {r_ft.att})"
+            )
+            assert (
+                r_d.stage1_n_obs == r_ft.stage1_n_obs
+            ), f"anticipation={ant}: stage1_n_obs mismatch between paths"
+
+
 class TestSpilloverDiDEffectiveRankDoF:
     """Stage-2 residual df should use effective rank (after solve_ols drops
     rank-deficient columns), not raw column count. Round-5 codex review

From 2b86d8c051ecec73c43ae601e601fc213bec3b93 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 07:05:53 -0400
Subject: [PATCH 3/3] SpilloverDiD: replace Conley smoke test with
 plumbing-verification
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round-8 CI review flagged `test_conley_se_differs_from_hc1` as a test-
coverage gap: the name promises a Conley-specific assertion but the body
only checked finite SE + ATT invariance. A silent fallback to HC1 (e.g.,
if SpilloverDiD ever stopped threading the Conley kwargs through to
`solve_ols`) would have passed.

Replace with:

- `test_conley_kwargs_threaded_to_solve_ols`: patches `solve_ols` at the
  import site, captures the kwargs of the stage-2 invocation, and
  asserts they include `vcov_type="conley"`, `conley_cutoff_km=200.0`,
  `conley_metric="haversine"`, `conley_lag_cutoff=0`, plus
  fit-time-derived `conley_coords` / `conley_time` / `conley_unit`
  arrays of the right shape. Any silent fallback to HC1 fails this.
- `test_conley_att_invariant_vs_hc1`: extracted from the old test —
  vcov choice does not change ATT (residualization + OLS fit are
  independent of variance). Now stands as a clean invariant rather than
  pretending to verify Conley-specific output.

Also bumps CHANGELOG test count 156 -> 157 and updates the Conley
description to "plumbing (verifies solve_ols is called with vcov_type=
'conley' + Conley kwargs, no silent HC1 fallback)".

All 157 tests pass; black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md            |  2 +-
 tests/test_spillover.py | 80 +++++++++++++++++++++++++++++++++--------
 2 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 07f8eab1..ad5135c2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (156 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
+- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
 ### Changed
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index 42ed7f5f..17cd5238 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -866,32 +866,84 @@ def test_conley_fit_runs(self):
         assert result.conley_lag_cutoff == 0
         assert np.isfinite(result.se)
 
-    def test_conley_se_differs_from_hc1(self):
-        """Conley SE differs from HC1 baseline (spatial correlation in errors)."""
+    def test_conley_kwargs_threaded_to_solve_ols(self):
+        """Round-8 CI review P3 (test coverage gap): the previous test was a
+        smoke test that only asserted finite SE + ATT invariance — a silent
+        fallback to HC1 would have passed. This test plumbing-verifies that
+        `solve_ols` is actually called with `vcov_type="conley"` AND the
+        Conley-specific kwargs (`conley_coords`, `conley_cutoff_km`,
+        `conley_metric`, `conley_time`, `conley_unit`, `conley_lag_cutoff`).
+        """
+        from unittest.mock import patch
+
         df = generate_butts_nonstaggered_dgp(
             seed=42, n_treated=20, n_near_control=80, n_far_control=100
         )
-        est_hc1 = SpilloverDiD(
+        est = SpilloverDiD(
             rings=[0.0, 100.0],
             conley_coords=("lat", "lon"),
-            vcov_type="hc1",
+            vcov_type="conley",
+            conley_cutoff_km=200.0,
+            conley_metric="haversine",
+            conley_lag_cutoff=0,
         )
-        result_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
-        est_conley = SpilloverDiD(
+        # Patch solve_ols at the import site in spillover.py so we can
+        # observe the kwargs SpilloverDiD passes through at stage 2.
+        import diff_diff.spillover as spillover_mod
+
+        captured: dict = {}
+
+        original_solve_ols = spillover_mod.solve_ols
+
+        def spy_solve_ols(*args, **kwargs):
+            # Capture the LAST call's kwargs (stage 2 is the last solve_ols
+            # invocation in fit()).
+            captured.clear()
+            captured.update(kwargs)
+            return original_solve_ols(*args, **kwargs)
+
+        with patch.object(spillover_mod, "solve_ols", side_effect=spy_solve_ols):
+            result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+        # Conley kwargs reached solve_ols (no silent HC1 fallback).
+        assert (
+            captured.get("vcov_type") == "conley"
+        ), f"expected solve_ols vcov_type='conley', got {captured.get('vcov_type')!r}"
+        assert captured.get("conley_cutoff_km") == 200.0
+        assert captured.get("conley_metric") == "haversine"
+        assert captured.get("conley_lag_cutoff") == 0
+        # The fit-time-derived spatial / temporal arrays must be present and
+        # have the right shape.
+        coords = captured.get("conley_coords")
+        assert coords is not None and coords.shape == (result.n_obs, 2)
+        conley_time = captured.get("conley_time")
+        conley_unit = captured.get("conley_unit")
+        assert conley_time is not None and len(conley_time) == result.n_obs
+        assert conley_unit is not None and len(conley_unit) == result.n_obs
+        # And the reported SE is finite (the actual Conley computation
+        # completed end-to-end).
+        assert np.isfinite(result.se)
+
+    def test_conley_att_invariant_vs_hc1(self):
+        """Point-estimate invariance: vcov choice does not change ATT
+        (the residualization + OLS fit are independent of variance).
+        """
+        df = generate_butts_nonstaggered_dgp(
+            seed=42, n_treated=20, n_near_control=80, n_far_control=100
+        )
+        result_hc1 = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            vcov_type="hc1",
+        ).fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        result_conley = SpilloverDiD(
             rings=[0.0, 100.0],
             conley_coords=("lat", "lon"),
             vcov_type="conley",
             conley_cutoff_km=200.0,
             conley_lag_cutoff=0,
-        )
-        result_conley = est_conley.fit(df, outcome="y", unit="unit", time="time", treatment="D")
-        # ATT point estimate unchanged across vcov; SE may differ.
+        ).fit(df, outcome="y", unit="unit", time="time", treatment="D")
         assert abs(result_hc1.att - result_conley.att) < 1e-10
-        # Conley SE may be larger or smaller than HC1 depending on spatial
-        # error correlation; just assert it's not identical.
-        # (Synthetic DGP has independent errors so they may be very close;
-        # use a loose tolerance — primarily a wiring test.)
-        assert np.isfinite(result_conley.se)
 
 
 # =============================================================================