From ced54b3d8dae0c9a5c0945c66ba8c9efce93a727 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 06:29:00 -0400 Subject: [PATCH 1/3] SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner (2022) DiD with ring-indicator covariates that identify, alongside the direct effect on treated (`tau_total`), per-ring spillover effects on near-control units (`delta_j`). Reference: Butts, K. (2023, originally 2021) "Difference-in-Differences with Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage differences in differences" arXiv:2207.05943. Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered timing in one estimator. Stage-2 regressor uses the time-varying `(1 - D_it) * Ring_{it,j}` form. Stage-1 subsample is Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed). Identification-check policy: period strict, unit warn-and-drop, plus connected-component check on the Omega_0 bipartite graph. SE clamps negative vcov diagonals before sqrt (sibling-estimator convention). `coefficients` exposes all (1+K) stage-2 entries keyed to vcov columns plus an "ATT" alias. `rank_deficient_action` validated at `__init__`. Variance: stage-2 OLS via `solve_ols` (HC1 / Conley / cluster). Gardner GMM first-stage uncertainty correction NOT applied in this PR (documented limitation; tracked in TODO). Deferred features (planned follow-ups, all in TODO): `event_study=True` per-event-time × ring coefficients, `survey_design=` integration, `ring_method="count"`, data-driven `d_bar`, GMM correction at stage 2, sparse staggered ring-distance path, TwoStageDiD-Conley first-class. Tests: `tests/test_spillover.py` (153 tests) + DGP factories at `tests/_dgp_utils.py`. Includes 20-seed Gardner identity bit-identity test (`SpilloverDiD.att` matches single-stage TWFE ring regression at `atol=1e-10` on non-staggered DGPs — the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator). Non-staggered MC at 50 seeds + 200-seed slow variant recovers both `tau_total` and `delta_1`; staggered MC at 30 seeds anchors `tau_total` only. Docs: REGISTRY section, API rst, `llms.txt` + `llms-full.txt`, README catalog entry, references, `doc-deps.yaml`, TODO follow-up rows. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + README.md | 1 + TODO.md | 8 + diff_diff/__init__.py | 6 + diff_diff/guides/llms-full.txt | 63 + diff_diff/guides/llms.txt | 1 + diff_diff/results.py | 111 + diff_diff/spillover.py | 1957 ++++++++++++ .../_autosummary/diff_diff.SpilloverDiD.rst | 21 + .../diff_diff.SpilloverDiDResults.rst | 61 + docs/api/index.rst | 3 + docs/api/spillover.rst | 234 ++ docs/doc-deps.yaml | 22 + docs/index.rst | 2 + docs/methodology/REGISTRY.md | 79 + docs/references.rst | 4 + tests/_dgp_utils.py | 250 ++ tests/test_spillover.py | 2610 +++++++++++++++++ 18 files changed, 5434 insertions(+) create mode 100644 diff_diff/spillover.py create mode 100644 docs/api/_autosummary/diff_diff.SpilloverDiD.rst create mode 100644 docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst create mode 100644 docs/api/spillover.rst create mode 100644 tests/_dgp_utils.py create mode 100644 tests/test_spillover.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 4e17a329..b45f1ae1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (153 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors `tau_total` only — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. ### Changed diff --git a/README.md b/README.md index b54d8aa0..ba9cdf8b 100644 --- a/README.md +++ b/README.md @@ -106,6 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`. - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance +- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves diff --git a/TODO.md b/TODO.md index 42ddd3a5..43f0d6d2 100644 --- a/TODO.md +++ b/TODO.md @@ -132,6 +132,14 @@ Deferred items from PR reviews that were not addressed before merge. | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low | | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium | | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low | +| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium | +| `SpilloverDiD(event_study=True)` per-event-time × ring decomposition (Butts Section 5 / Table 2 `S^k_{it}` / `Ring^k_{it,j}`). Currently raises `NotImplementedError`. The implementation adds event-time dummies × ring covariates to the stage-2 design and emits a MultiIndex on `spillover_effects`. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Medium | +| `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low | +| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low | +| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low | +| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium | +| Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium | +| `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low | #### Performance diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index aa48ff77..cee94043 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -171,6 +171,10 @@ TwoStageDiDResults, two_stage_did, ) +from diff_diff.spillover import ( + SpilloverDiD, +) +from diff_diff.results import SpilloverDiDResults # re-export from diff_diff.stacked_did import ( StackedDiD, StackedDiDResults, @@ -300,6 +304,7 @@ "SunAbraham", "ImputationDiD", "TwoStageDiD", + "SpilloverDiD", "TripleDifference", "TROP", "StackedDiD", @@ -341,6 +346,7 @@ "TwoStageDiDResults", "TwoStageBootstrapResults", "two_stage_did", + "SpilloverDiDResults", "TripleDifferenceResults", "triple_difference", "StaggeredTripleDifference", diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 5bbe74c2..14a66e76 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -461,6 +461,69 @@ results = est.fit(data, outcome='outcome', unit='unit', results.print_summary() ``` +### SpilloverDiD + +Butts (2021) ring-indicator spillover-aware DiD. Augments two-stage Gardner with ring-indicator covariates that identify direct effect on treated (`tau_total`) and per-ring spillover effects on near-control units (`delta_j`). Handles non-staggered and staggered timing in one estimator. Recommends `vcov_type="conley"` with cutoff = `d_bar` (paper Section 3.1). + +```python +SpilloverDiD( + rings: list[float], # K+1 sorted breakpoints; K rings + d_bar: float | None = None, # Far-away cutoff (defaults to max(rings)) + vcov_type: str = "hc1", # "hc1", "conley", or default cluster + conley_coords: tuple[str, str] | None = None, # (lat_col, lon_col), required + conley_metric: str = "haversine", # or "euclidean" / callable + conley_cutoff_km: float | None = None, + conley_lag_cutoff: int | None = None, + cluster: str | None = None, + alpha: float = 0.05, + anticipation: int = 0, + event_study: bool = False, # Deferred: raises NotImplementedError if True + horizon_max: int | None = None, # Deferred (event-study mode) + rank_deficient_action: str = "warn", +) +``` + +**fit() parameters:** + +```python +sp.fit( + data: pd.DataFrame, + outcome: str, + unit: str, + time: str, + treatment: str | None = None, # binary D_it; auto-converted to first_treat + first_treat: str | None = None, # OR onset time per unit (Gardner) + covariates: list[str] | None = None, # Deferred: NotImplementedError if non-None + survey_design: object = None, # Deferred: NotImplementedError if non-None +) -> SpilloverDiDResults +``` + +**Restrictions (Wave B MVP — planned follow-ups):** + +- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up. +- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration) +- `event_study=True` raises `NotImplementedError` (planned: per-event-time × ring decomposition per Butts Table 2) +- `horizon_max=` raises `NotImplementedError` (used only with event_study) +- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands) +- Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed + +**Usage:** + +```python +from diff_diff import SpilloverDiD + +est = SpilloverDiD( + rings=[0, 50, 100, 200], + conley_coords=("lat", "lon"), + vcov_type="conley", + conley_cutoff_km=200.0, + conley_lag_cutoff=0, +) +results = est.fit(data, outcome="y", unit="unit", time="time", treatment="D") +print(f"tau_total = {results.att:.4f}") +print(results.spillover_effects) # per-ring DataFrame +``` + ### SyntheticDiD Synthetic Difference-in-Differences (Arkhangelsky et al. 2021). Combines DiD with synthetic control by re-weighting control units. diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index df0a880a..a310d621 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -58,6 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance +- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves diff --git a/diff_diff/results.py b/diff_diff/results.py index d7f4a3d8..02b7e5a8 100644 --- a/diff_diff/results.py +++ b/diff_diff/results.py @@ -354,6 +354,117 @@ def _get_significance_stars(p_value: float) -> str: return "" +@dataclass +class SpilloverDiDResults(DiDResults): + """Results from a ring-indicator spillover-aware DiD estimation (Butts 2021). + + Extends :class:`DiDResults` with per-ring spillover effect estimates and + diagnostic counts for the identifying control population. + + Attributes + ---------- + att : float + Total effect on the treated, ``tau_total`` (inherited from DiDResults). + spillover_effects : pd.DataFrame, optional + Per-ring spillover-on-control estimates. Columns: ``["coef", "se", + "t_stat", "p_value", "ci_low", "ci_high"]``. Index: ring label + like ``"[0, 50)"``; for event-study output the index is a + ``MultiIndex`` over ``(ring_label, event_time)``. + ring_breakpoints : list of float + Sorted distance breakpoints used to construct the rings. + d_bar : float + Far-away cutoff (Butts Assumption 5). Equals ``max(ring_breakpoints)`` + unless explicitly overridden. + n_units_ever_in_ring : dict[str, int] + Counts of UNIQUE units that appear in each ring AT LEAST ONCE + across observed periods. Under staggered timing the ring + membership is time-varying, so a unit can be counted in + multiple rings (one per period it visits). Under non-staggered + timing the rings are unit-static, so this is a clean partition + (each unit appears in exactly one ring or the far-away group). + Includes treated units in Ring_1 by geometric construction + even though the ``(1 - D_it)`` factor zeros their stage-2 + contribution. + n_far_away_obs : int + Number of observations with ``D_it = 0`` AND ``d_it > d_bar``; + these observations identify the counterfactual trend (Butts + Assumption 5(ii)). + is_staggered : bool + True if multiple distinct treatment-onset times were detected. + event_study : bool + True if per-event-time ring coefficients were emitted (i.e., + ``self.event_study=True`` was set on the estimator). + stage1_n_obs : int + Number of observations in the stage-1 untreated-and-unexposed + subsample (``Omega_0_butts``). + """ + + spillover_effects: Optional[pd.DataFrame] = field(default=None) + ring_breakpoints: Optional[List[float]] = field(default=None) + d_bar: Optional[float] = field(default=None) + n_units_ever_in_ring: Optional[Dict[str, int]] = field(default=None) + n_far_away_obs: Optional[int] = field(default=None) + is_staggered: Optional[bool] = field(default=None) + event_study: Optional[bool] = field(default=None) + stage1_n_obs: Optional[int] = field(default=None) + anticipation: Optional[int] = field(default=None) + + def summary(self, alpha: Optional[float] = None) -> str: + """Extended summary with ATT row plus per-ring rows.""" + base = super().summary(alpha=alpha) + if self.spillover_effects is None or self.spillover_effects.empty: + return base + lines = base.split("\n") + # Find the closing separator line and inject ring rows before it. + ring_rows = ["", "Spillover Effects (ring-indicator, Butts 2021)".center(70), "-" * 70] + header = ( + f"{'Ring':<15} {'Estimate':>12} {'Std. Err.':>12} " + f"{'t-stat':>10} {'P>|t|':>10} {'':>5}" + ) + ring_rows.append(header) + ring_rows.append("-" * 70) + for label, row in self.spillover_effects.iterrows(): + coef = row.get("coef", np.nan) + se = row.get("se", np.nan) + t_stat = row.get("t_stat", np.nan) + p_value = row.get("p_value", np.nan) + stars = _get_significance_stars(p_value) + label_str = str(label) if not isinstance(label, tuple) else f"{label[0]} k={label[1]}" + ring_rows.append( + f"{label_str[:15]:<15} {coef:>12.4f} {se:>12.4f} " + f"{t_stat:>10.3f} {p_value:>10.4f} {stars:>5}" + ) + ring_rows.append("-" * 70) + # Insert ring block before the final "==..." line (last row of base). + for idx in range(len(lines) - 1, -1, -1): + if lines[idx].startswith("="): + lines = lines[:idx] + ring_rows + lines[idx:] + break + return "\n".join(lines) + + def to_dict(self) -> Dict[str, Any]: + """Override to serialize ``spillover_effects`` DataFrame as list-of-dicts.""" + base = super().to_dict() + base.update( + { + "spillover_effects": ( + self.spillover_effects.reset_index().to_dict(orient="records") + if self.spillover_effects is not None + else None + ), + "ring_breakpoints": self.ring_breakpoints, + "d_bar": self.d_bar, + "n_units_ever_in_ring": self.n_units_ever_in_ring, + "n_far_away_obs": self.n_far_away_obs, + "is_staggered": self.is_staggered, + "event_study": self.event_study, + "stage1_n_obs": self.stage1_n_obs, + "anticipation": self.anticipation, + } + ) + return base + + @dataclass class PeriodEffect: """ diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py new file mode 100644 index 00000000..b8895e9d --- /dev/null +++ b/diff_diff/spillover.py @@ -0,0 +1,1957 @@ +""" +SpilloverDiD — Butts (2021) ring-indicator spillover-aware DiD. + +Augments a two-stage Gardner (2022) DiD with ring-indicator covariates that +identify the spillover effect on near-control units alongside the direct +effect on treated units. Handles both panel non-staggered and Section 5 +staggered timing in a single estimator. + +References +---------- +Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. + arXiv:2105.03737v3 (originally posted 2021). +Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943. + +Notes +----- +The paper's notation in Equation 5/6 is ``(1 - D_it) * Ring_{ij}`` with +``S_i`` unit-static. Reading that literally under a two-way fixed effects +specification yields a rank-deficient design (``(1 - D_it) * S_i = S_i - +D_it``; ``S_i`` is absorbed by ``mu_i``, leaving ``-D_it``). The paper +defines ``S_it = S_i * 1{t >= t_treat}`` (page 12, just above Equation 5) +and Section 5's Table 2 makes the time-varying form explicit +(``S^k_{it}``, ``Ring^k_{it,j}``). This implementation uses the +time-varying form, which is the spec that supports the paper's +identification argument (Proposition 2.3 + Section 3.1 subsample logic). +""" + +import warnings +from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union + +import numpy as np +import pandas as pd + +from diff_diff.conley import ( + _CONLEY_EARTH_RADIUS_KM, + _CONLEY_SPARSE_N_THRESHOLD, + _haversine_km, + _validate_callable_metric_result, +) +from diff_diff.linalg import solve_ols +from diff_diff.results import SpilloverDiDResults +from diff_diff.utils import safe_inference + +# Type alias mirroring diff_diff.conley.ConleyMetric so callers can supply +# any of the built-in identifiers or a user callable returning a pairwise +# distance matrix. +SpilloverMetric = Union[ + Literal["haversine", "euclidean"], + Callable[[np.ndarray, np.ndarray], np.ndarray], +] + + +# ============================================================================= +# Ring construction helpers (Step 1) +# ============================================================================= + + +def _haversine_km_pairwise( + coords_a: np.ndarray, + coords_b: np.ndarray, +) -> np.ndarray: + """Vectorized pairwise great-circle distance (km) between two coord sets. + + Parameters + ---------- + coords_a : ndarray of shape (n_a, 2) + ``(lat, lon)`` in DEGREES for the first set of points. + coords_b : ndarray of shape (n_b, 2) + ``(lat, lon)`` in DEGREES for the second set of points. + + Returns + ------- + ndarray of shape (n_a, n_b) + Great-circle distances in km. Matches the ``_haversine_km`` Earth + radius convention (6371.01 km, mirroring R ``conleyreg``). + """ + lat_a = coords_a[:, 0][:, None] + lon_a = coords_a[:, 1][:, None] + lat_b = coords_b[:, 0][None, :] + lon_b = coords_b[:, 1][None, :] + return _haversine_km(lat_a, lon_a, lat_b, lon_b) + + +def _euclidean_pairwise( + coords_a: np.ndarray, + coords_b: np.ndarray, +) -> np.ndarray: + """Vectorized pairwise Euclidean distance between two coord sets. + + Coordinates are treated as planar; no unit conversion. Matches the + ``_pairwise_distance_matrix`` Euclidean branch of ``conley.py``. + """ + diffs = coords_a[:, None, :] - coords_b[None, :, :] + return np.sqrt(np.einsum("ijk,ijk->ij", diffs, diffs)) + + +def _apply_callable_metric_pairwise( + metric: Callable[[np.ndarray, np.ndarray], np.ndarray], + coords_a: np.ndarray, + coords_b: np.ndarray, +) -> np.ndarray: + """Apply a user-supplied callable metric to two coord sets. + + Unlike :func:`_validate_callable_metric_result` which checks square + ``(n, n)`` symmetry on a single coord set, this helper accepts a + rectangular ``(n_a, n_b)`` result. The validator is therefore relaxed: + we only require finiteness, non-negativity, and correct shape. The + zero-diagonal / symmetry checks apply only when the same coord set is + passed on both sides; ring-construction usage passes a treated-only + subset on side B, so the diagonal of the rectangular result is not + meaningful. + """ + result = metric(coords_a, coords_b) + arr = np.asarray(result, dtype=np.float64) + expected_shape = (coords_a.shape[0], coords_b.shape[0]) + if arr.shape != expected_shape: + raise ValueError( + "conley_metric callable returned shape " + f"{arr.shape} for pairwise ring distance, expected {expected_shape}." + ) + if not np.isfinite(arr).all(): + raise ValueError( + "conley_metric callable returned non-finite entries for pairwise " + "ring distance; all distances must be finite." + ) + if (arr < 0.0).any(): + raise ValueError( + "conley_metric callable returned negative entries for pairwise " + "ring distance; all distances must be non-negative." + ) + return arr + + +def _pairwise_ring_distances( + coords_units: np.ndarray, + coords_treated: np.ndarray, + metric: SpilloverMetric, +) -> np.ndarray: + """Compute (n_units, n_treated) pairwise distances under the chosen metric.""" + if callable(metric): + return _apply_callable_metric_pairwise(metric, coords_units, coords_treated) + if metric == "haversine": + return _haversine_km_pairwise(coords_units, coords_treated) + if metric == "euclidean": + return _euclidean_pairwise(coords_units, coords_treated) + raise ValueError( + f"Unknown conley_metric: {metric!r}. Expected 'haversine', 'euclidean', " + "or a callable returning a pairwise distance matrix." + ) + + +def _compute_nearest_treated_distance_static( + data: pd.DataFrame, + *, + unit: str, + coords: Tuple[str, str], + metric: SpilloverMetric, + treated_unit_ids: np.ndarray, + cutoff_km: Optional[float] = None, +) -> Tuple[np.ndarray, np.ndarray]: + """Return per-unit nearest-treated distance for the non-staggered case. + + The set of treated units is fixed (ever-treated), so distances are + unit-level constants and don't vary across periods. Caller broadcasts + to per-row when assembling ring covariates. + + Parameters + ---------- + data : pd.DataFrame + Panel data with one row per (unit, period). Used to extract + per-unit coords via :meth:`DataFrame.drop_duplicates` on ``unit``. + unit : str + Column name of the unit identifier. + coords : tuple of (str, str) + ``(lat_col, lon_col)``. + metric : "haversine" | "euclidean" | callable + Distance metric. For ``"haversine"``, ``coords`` is interpreted as + ``(lat, lon)`` in degrees. For ``"euclidean"``, ``coords`` is + planar. Callable receives two ``(n, 2)`` arrays and must return an + ``(n_a, n_b)`` finite non-negative distance matrix. + treated_unit_ids : ndarray + IDs of ever-treated units (used as side B of pairwise distance). + cutoff_km : float, optional + If set and ``len(unit_index) > _CONLEY_SPARSE_N_THRESHOLD``, the + sparse cKDTree path is used to find treated neighbors within + ``cutoff_km`` per unit; otherwise the dense (n_units × n_treated) + matrix is built. Units with no treated neighbor within ``cutoff_km`` + receive ``d_i = inf`` (they fall outside any ring and into the + far-away control group, identical to dense-path behavior with + infinite distance to the nearest reached treated unit). + + Returns + ------- + d_i : ndarray of shape (n_unique_units,) + ``d_i = min_{k in treated_unit_ids} d(i, k)`` per unique unit. + unit_index : ndarray of shape (n_unique_units,) + Unit identifiers in the same order as ``d_i``. + """ + unit_coords_df = ( + data[[unit, coords[0], coords[1]]] + .drop_duplicates(subset=[unit]) + .set_index(unit) + .sort_index() + ) + unit_index = np.asarray(unit_coords_df.index.values) + all_coords = np.asarray(unit_coords_df[[coords[0], coords[1]]].values, dtype=np.float64) + treated_set = set(treated_unit_ids.tolist()) + treated_mask = np.array([uid in treated_set for uid in unit_index], dtype=bool) + treated_coords = all_coords[treated_mask] + if treated_coords.shape[0] == 0: + raise ValueError( + "_compute_nearest_treated_distance_static: no treated units present " + "in `data` matching `treated_unit_ids`." + ) + + n_units = all_coords.shape[0] + is_builtin_metric = metric in ("haversine", "euclidean") + if cutoff_km is not None and n_units > _CONLEY_SPARSE_N_THRESHOLD and is_builtin_metric: + d_i = _compute_nearest_treated_distance_sparse( + all_coords=all_coords, + treated_coords=treated_coords, + metric=metric, # type: ignore[arg-type] + cutoff_km=float(cutoff_km), + ) + else: + # Dense path: full pairwise matrix, then row-min. + dists = _pairwise_ring_distances(all_coords, treated_coords, metric) + d_i = dists.min(axis=1) + return d_i.astype(np.float64), unit_index + + +def _compute_nearest_treated_distance_sparse( + *, + all_coords: np.ndarray, + treated_coords: np.ndarray, + metric: Literal["haversine", "euclidean"], + cutoff_km: float, +) -> np.ndarray: + """Sparse cKDTree path for nearest-treated-distance computation. + + Used when ``n_units > _CONLEY_SPARSE_N_THRESHOLD`` AND the metric is a + built-in string. The tree is built on the treated subset (small) and + queried with each unit row. Units with no treated neighbor inside + ``cutoff_km`` get ``d_i = inf``, which places them in the far-away + control group on the downstream ring-membership step. + + For haversine: lat/lon are projected to 3-D unit-sphere Cartesian + coordinates; the chord-distance query radius is + ``2 * sin(arc / (2 * R_earth))`` with arc clamped at ``pi * R_earth`` + so cutoffs beyond a hemisphere don't shrink. Exact great-circle + distances are then recomputed via :func:`_haversine_km` for the in- + range matches and the per-row minimum is taken. + + For euclidean: planar L2 directly in cKDTree. + + Parameters + ---------- + all_coords : ndarray of shape (n_units, 2) + Coordinates for all units. + treated_coords : ndarray of shape (n_treated, 2) + Coordinates for ever-treated units. + metric : 'haversine' or 'euclidean' + Built-in metric only; callables fall back to the dense path. + cutoff_km : float + Maximum considered distance. Units beyond this get ``d_i = inf``. + + Returns + ------- + ndarray of shape (n_units,) + Nearest-treated distance per unit (inf when no neighbor in range). + """ + # Imported lazily to mirror conley.py's lazy-scipy pattern and keep + # module import cheap when the sparse path isn't exercised. + from scipy.spatial import cKDTree # noqa: WPS433 (deferred import) + + n_units = all_coords.shape[0] + if metric == "haversine": + # Project lat/lon (degrees) to 3-D unit-sphere Cartesian. + lat_rad_all = np.radians(all_coords[:, 0]) + lon_rad_all = np.radians(all_coords[:, 1]) + unit_xyz = np.column_stack( + [ + np.cos(lat_rad_all) * np.cos(lon_rad_all), + np.cos(lat_rad_all) * np.sin(lon_rad_all), + np.sin(lat_rad_all), + ] + ) + lat_rad_tr = np.radians(treated_coords[:, 0]) + lon_rad_tr = np.radians(treated_coords[:, 1]) + tree_xyz = np.column_stack( + [ + np.cos(lat_rad_tr) * np.cos(lon_rad_tr), + np.cos(lat_rad_tr) * np.sin(lon_rad_tr), + np.sin(lat_rad_tr), + ] + ) + # Chord-distance radius for the query; clamp arc at pi (a half-revolution) + # so cutoffs > pi * R_earth do not shrink chord radius below the true reach. + arc_radians = min(cutoff_km / _CONLEY_EARTH_RADIUS_KM, np.pi) + query_r = 2.0 * np.sin(arc_radians / 2.0) + query_r *= 1.0 + 1e-12 # numerical safety margin + tree = cKDTree(tree_xyz) + # Query in chord space, recompute exact great-circle distance for matches. + neighbors = tree.query_ball_point(unit_xyz, r=query_r, p=2.0) + d_i = np.full(n_units, np.inf, dtype=np.float64) + for i, idxs in enumerate(neighbors): + if not idxs: + continue + # Exact great-circle distance for the in-range treated neighbors. + arr_idxs = np.asarray(idxs, dtype=np.intp) + d_subset = _haversine_km( + all_coords[i, 0], + all_coords[i, 1], + treated_coords[arr_idxs, 0], + treated_coords[arr_idxs, 1], + ) + d_i[i] = float(d_subset.min()) + return d_i + + # Euclidean: cKDTree handles directly. + tree = cKDTree(treated_coords) + d_i = np.full(n_units, np.inf, dtype=np.float64) + neighbors = tree.query_ball_point(all_coords, r=cutoff_km, p=2.0) + for i, idxs in enumerate(neighbors): + if not idxs: + continue + arr_idxs = np.asarray(idxs, dtype=np.intp) + d_subset = _euclidean_pairwise(all_coords[i : i + 1], treated_coords[arr_idxs]) + d_i[i] = float(d_subset.min()) + return d_i + + +def _compute_nearest_treated_distance_staggered( + data: pd.DataFrame, + *, + unit: str, + time: str, + coords: Tuple[str, str], + metric: SpilloverMetric, + first_treat_by_unit: Dict[Any, Any], +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Return per-row nearest-treated distance for the staggered case. + + For each (unit, period) observation, find the minimum distance to any + unit that is treated BY THE END of that period (``first_treat_k <= + t``). Ring membership in the staggered case is therefore unit-time + varying. + + Parameters + ---------- + data : pd.DataFrame + Panel data (one row per unit-period). + unit : str + Unit identifier column name. + time : str + Time period column name. + coords : tuple of (str, str) + ``(lat_col, lon_col)``. + metric : "haversine" | "euclidean" | callable + Distance metric. + first_treat_by_unit : dict + Mapping from unit identifier to onset time (or ``np.inf`` for + never-treated). Generated by :func:`_extract_treatment_onsets`. + + Notes + ----- + The staggered helper currently always uses dense pairwise distance per + cohort. A sparse cKDTree branch (mirroring the static helper) is queued + as a follow-up — see TODO.md. + + Returns + ------- + d_it : ndarray of shape (n_rows,) + Per-row nearest-treated distance, with ``inf`` for rows where no + unit has been treated yet by time t (early periods). + row_unit : ndarray of shape (n_rows,) + Aligned unit identifier per row (for downstream broadcasting). + row_time : ndarray of shape (n_rows,) + Aligned time identifier per row. + """ + unit_coords_df = ( + data[[unit, coords[0], coords[1]]].drop_duplicates(subset=[unit]).set_index(unit) + ) + unit_index = np.asarray(unit_coords_df.index.values) + all_coords = np.asarray(unit_coords_df[[coords[0], coords[1]]].values, dtype=np.float64) + unit_to_pos = {uid: pos for pos, uid in enumerate(unit_index)} + + row_unit = np.asarray(data[unit].values) + row_time = np.asarray(data[time].values) + n_rows = len(row_unit) + d_it = np.full(n_rows, np.inf, dtype=np.float64) + + # Determine the cohort onset times that exist in the data (excluding never-treated). + unique_onsets = sorted({ft for ft in first_treat_by_unit.values() if np.isfinite(ft)}) + if not unique_onsets: + # Degenerate: no treated units. Caller should have rejected this + # in `_validate_spillover_inputs`, but defensively return inf. + return d_it, row_unit, row_time + + # Row's unit position. Invariant across cohort iterations — compute + # once outside the loop. + row_pos = np.array([unit_to_pos[uid] for uid in row_unit], dtype=np.intp) + + # For each unique onset time, compute (n_units, n_treated_by_then) pairwise + # distances ONCE, then assign to rows whose t >= that onset (carrying forward + # the minimum across cohorts). + for onset in unique_onsets: + treated_by_onset_ids = [uid for uid, ft in first_treat_by_unit.items() if ft <= onset] + treated_positions = np.array( + [unit_to_pos[uid] for uid in treated_by_onset_ids if uid in unit_to_pos], + dtype=np.intp, + ) + if treated_positions.size == 0: + continue + treated_coords = all_coords[treated_positions] + # Compute per-unit nearest distance to this cohort's treated set. + dists_to_cohort = _pairwise_ring_distances(all_coords, treated_coords, metric).min(axis=1) + # Update rows whose period t >= onset: take min of current d_it and the + # newly-available cohort distance. + affected_rows = row_time >= onset + if not affected_rows.any(): + continue + row_cohort_dist = dists_to_cohort[row_pos] + # Only update rows where this cohort's distance is smaller than the + # current d_it (carries the running minimum across cohorts). + update_mask = affected_rows & (row_cohort_dist < d_it) + d_it[update_mask] = row_cohort_dist[update_mask] + + return d_it, row_unit, row_time + + +def _build_ring_indicators( + d_values: np.ndarray, + rings: List[float], +) -> np.ndarray: + """Build K boolean ring masks from distances and breakpoints. + + Convention (per Butts Equation 6 + plan Risks #2): half-open at the + top of each interior ring, CLOSED at the outermost upper edge so units + exactly at ``d_bar`` belong to the last ring (not the far-away group). + Far-away controls use a strict ``d_i > d_bar`` check (handled + elsewhere). Treated units have ``d_i = 0`` and fall in Ring_1 by + construction; their ring contribution is later zeroed by the + ``(1 - D_i)`` factor. + + Parameters + ---------- + d_values : ndarray + Distances (per-unit for non-staggered or per-row for staggered). + rings : list of float + Sorted breakpoints with ``len(rings) >= 2``. ``K = len(rings) - 1`` + rings are constructed. + + Returns + ------- + masks : ndarray of shape (len(d_values), K), bool + ``masks[i, j] = True`` if ``d_values[i]`` falls in ring ``j``. + + Raises + ------ + ValueError + ``rings`` has fewer than 2 elements, or is not strictly increasing. + """ + rings_arr = np.asarray(rings, dtype=np.float64) + if rings_arr.ndim != 1 or rings_arr.size < 2: + raise ValueError( + "rings must be a sorted list of at least 2 breakpoints " + f"(got shape {rings_arr.shape})." + ) + if (np.diff(rings_arr) <= 0).any(): + raise ValueError("rings must be strictly increasing; got " f"{rings_arr.tolist()}.") + if (rings_arr < 0).any(): + raise ValueError("rings must be non-negative; got " f"{rings_arr.tolist()}.") + + n = d_values.shape[0] + K = rings_arr.size - 1 + masks = np.zeros((n, K), dtype=bool) + for j in range(K): + lo = rings_arr[j] + hi = rings_arr[j + 1] + if j == K - 1: + # Outermost ring: closed at d_bar so units at the boundary + # belong to this ring (not the far-away group). + masks[:, j] = (d_values >= lo) & (d_values <= hi) + else: + # Interior rings: half-open at top so the breakpoint between + # adjacent rings unambiguously falls in the next ring. + masks[:, j] = (d_values >= lo) & (d_values < hi) + return masks + + +def _ring_label(rings: List[float], j: int) -> str: + """Render the human-readable ring label for index ``j``. + + Convention matches :func:`_build_ring_indicators`: half-open at the + top of interior rings, closed at the outermost upper edge. + """ + K = len(rings) - 1 + lo = rings[j] + hi = rings[j + 1] + if j == K - 1: + return f"[{lo:g}, {hi:g}]" + return f"[{lo:g}, {hi:g})" + + +# ============================================================================= +# Treatment-timing helpers (Step 2) +# ============================================================================= + + +def _extract_treatment_onsets( + data: pd.DataFrame, + first_treat_col: str, + unit_col: str, + *, + treat_zero_as_never_treated: bool = True, +) -> Dict[Any, float]: + """Return a dict mapping each unit to its treatment onset time. + + Parameters + ---------- + treat_zero_as_never_treated : bool, default True + When True (default, matching Gardner / TwoStageDiD user convention), + ``first_treat = 0`` is treated as a never-treated sentinel + equivalent to ``np.inf``. Set to False for INTERNAL onset columns + produced by :func:`_convert_treatment_to_first_treat` from a + binary ``D`` column — there, ``0`` may legitimately be the + onset time on 0-indexed panels (a unit treated at the first + observed period gets ``first_treat = 0``). The auto-generated + column writes ``np.inf`` for never-treated, so the 0-as-sentinel + collision is avoided. + + Notes + ----- + If a unit has non-constant ``first_treat`` values across its rows, + ``ValueError`` is raised — SpilloverDiD requires the + absorbing-treatment assumption (one onset per unit). Mirrors + :class:`TwoStageDiD`'s warning behaviour, but escalates to a hard error + because the spillover identification math depends on each unit having a + single well-defined ``S_it`` trajectory. + """ + + def _normalize(v: float) -> float: + if np.isinf(v): + return np.inf + if treat_zero_as_never_treated and v == 0: + return np.inf + return float(v) + + onsets: Dict[Any, float] = {} + non_constant_units: List[Any] = [] + for unit_id, group in data.groupby(unit_col): + ft_unique = group[first_treat_col].dropna().unique().tolist() + normalized = {_normalize(v) for v in ft_unique} + if len(normalized) > 1: + non_constant_units.append(unit_id) + continue + if not normalized: + # All rows are NaN → treat as never-treated. + onsets[unit_id] = np.inf + continue + # Use the unique value, not iloc[0], to avoid being fooled by a + # leading-NaN row when the rest of the unit is consistently treated. + ft = next(iter(normalized)) + onsets[unit_id] = ft # already normalized: np.inf for never-treated, float otherwise + if non_constant_units: + sample = non_constant_units[:5] + suffix = f" (and {len(non_constant_units) - 5} more)" if len(non_constant_units) > 5 else "" + raise ValueError( + f"{len(non_constant_units)} unit(s) have non-constant " + f"'{first_treat_col}' values across rows (e.g. {sample}{suffix}). " + "SpilloverDiD requires the absorbing-treatment assumption " + "(one onset per unit, treatment never reverses). For " + "non-absorbing / reversible treatments, see " + "ChaisemartinDHaultfoeuille." + ) + return onsets + + +def _convert_treatment_to_first_treat( + data: pd.DataFrame, + treatment: str, + time: str, + unit: str, +) -> Tuple[pd.DataFrame, str]: + """Auto-convert a binary ``D_it`` column to a per-unit ``first_treat`` column. + + Returns a defensive-copy frame augmented with a new + ``"_spillover_first_treat"`` column whose value per unit is + ``min{t : D_it = 1}`` for ever-treated units and ``np.inf`` for + never-treated. The original ``treatment`` column is preserved. + + **Absorbing-treatment validation:** after extracting ``first_treat``, + each ever-treated unit's ``D_it`` is verified to be 1 at all rows with + ``t >= first_treat[unit]`` (treatment never reverses). Non-absorbing + patterns like ``[0, 1, 0]`` raise ``ValueError`` rather than being + silently coerced into ``first_treat = min(t | D_it = 1)``. + + Raises + ------ + ValueError + ``data`` does not contain a numeric ``treatment`` column or + ``time`` / ``unit`` columns; ``treatment`` has values outside + ``{0, 1}``; or treatment is non-absorbing for some unit. + """ + if treatment not in data.columns: + raise ValueError(f"treatment column '{treatment}' not in data.") + # NaN in treatment is not silently coerced — it would later be rebuilt + # from `first_treat` and could flip a row from "unknown" to "treated" + # or "control" with no warning. + nan_mask = data[treatment].isna() + if bool(nan_mask.any()): + n_nan = int(nan_mask.sum()) + raise ValueError( + f"treatment column '{treatment}' contains {n_nan} NaN value(s). " + "SpilloverDiD requires explicit 0/1 status on every row; " + "missing-treatment rows must be either imputed or dropped " + "before fitting (the auto-conversion path cannot silently " + "reclassify them since that would change tau_total and " + "delta_j without warning)." + ) + treat_vals = data[treatment].unique() + # Exact binary check — NOT `int(v) in (0, 1)` (which would accept 0.9, + # 1.1, etc. by rounding-down semantics and silently misclassify + # fractional rows into the control group). + if not all(v in (0, 0.0, 1, 1.0) for v in treat_vals): + raise ValueError( + f"treatment column '{treatment}' must contain only exact 0/1 " + f"values; got unique values: {sorted(treat_vals)}. Fractional " + "values (e.g. 0.9 or 1.1) are NOT silently coerced — fix the " + "data or thresholding upstream before passing to SpilloverDiD." + ) + + out = data.copy(deep=False) + treated_rows = out[out[treatment] == 1] + if treated_rows.empty: + out["_spillover_first_treat"] = np.inf + return out, "_spillover_first_treat" + + onset_by_unit = treated_rows.groupby(unit)[time].min() + + # Verify absorbing: for each ever-treated unit, D_it must be EXACTLY 1 + # at every row with t >= first_treat[unit]. NOT merely "not equal to 0" + # — that would silently accept e.g. NaN or other non-binary values that + # slipped past the binary check above (defense in depth). + reversing_units: List[Any] = [] + for u in onset_by_unit.index: + onset = onset_by_unit.loc[u] + unit_rows = out[(out[unit] == u) & (out[time] >= onset)] + if (unit_rows[treatment] != 1).any(): + reversing_units.append(u) + if reversing_units: + sample = reversing_units[:5] + suffix = f" (and {len(reversing_units) - 5} more)" if len(reversing_units) > 5 else "" + raise ValueError( + f"{len(reversing_units)} unit(s) have non-absorbing treatment " + f"patterns (treatment reverses to 0 after onset; e.g. units " + f"{sample}{suffix}). SpilloverDiD requires the absorbing-" + "treatment assumption — once a unit is treated, it stays " + "treated. For non-absorbing / reversible treatments, see " + "ChaisemartinDHaultfoeuille." + ) + + onset_lookup: Dict[Any, float] = { + uid: float(onset_by_unit.loc[uid]) if uid in onset_by_unit.index else np.inf + for uid in out[unit].unique() + } + out["_spillover_first_treat"] = out[unit].map(onset_lookup).astype(np.float64).values + return out, "_spillover_first_treat" + + +# ============================================================================= +# Two-stage Gardner inline (Step 3) +# ============================================================================= + +# Convergence tolerance for the iterative alternating-projection FE solver +# (Gauss-Seidel style; mirrors `TwoStageDiD._iterative_fe`). +_FE_ITER_MAX = 100 +_FE_ITER_TOL = 1e-10 + + +def _check_omega_0_connectivity( + *, + omega_0_mask: np.ndarray, + unit_codes_arr: np.ndarray, + time_codes_arr: np.ndarray, + units_in_omega_0: set, + n_times: int, + unit_uniques: List[Any], +) -> None: + """Raise ``ValueError`` if the Omega_0 bipartite graph is disconnected. + + Stage 1's iterative FE solver identifies ``(mu_i, lambda_t)`` only up to + component-specific constants per connected component of the bipartite + graph (supported units on one side, periods on the other; edge = + Omega_0 row at that (unit, period) cell). If the graph splits into + K > 1 unit-bearing components, residualization later combines + ``mu_i`` from one component with ``lambda_t`` from another, silently + corrupting ``y_tilde`` and downstream ``tau_total`` / ``delta_j``. + + Balanced panel + per-unit/per-period Omega_0 coverage is NECESSARY + but not SUFFICIENT — connectivity is the load-bearing identification + condition. Under the current absorbing-treatment + period-strict + + unit-warn-drop regime this case may be unreachable in practice (we + were unable to construct an example that survives the upstream + validators), but the check is defense-in-depth and future-proofs + Wave B extensions (event-study, survey-design integration, possible + reversible-treatment relaxations). + """ + from scipy.sparse import csr_matrix + from scipy.sparse.csgraph import connected_components + + supported_units_sorted = sorted(units_in_omega_0) + n_supp = len(supported_units_sorted) + if n_supp <= 1: + # No multi-component case possible with 0 or 1 supported units. + return + + supp_unit_to_idx = {code: i for i, code in enumerate(supported_units_sorted)} + + omega_unit_codes = unit_codes_arr[omega_0_mask] + omega_time_codes = time_codes_arr[omega_0_mask] + + # Every Omega_0 row's unit is by definition in `units_in_omega_0`, so + # all rows contribute edges to the supported subgraph. + edge_unit_idx = np.array( + [supp_unit_to_idx[int(c)] for c in omega_unit_codes], + dtype=np.int64, + ) + edge_time_offset = n_supp + np.asarray(omega_time_codes, dtype=np.int64) + + # Symmetric adjacency: nodes 0..n_supp-1 are units, n_supp..n_supp+n_times-1 + # are periods. Edge weights are 1 (presence only). + rows = np.concatenate([edge_unit_idx, edge_time_offset]) + cols = np.concatenate([edge_time_offset, edge_unit_idx]) + data_ones = np.ones(len(rows), dtype=np.int8) + adj = csr_matrix( + (data_ones, (rows, cols)), + shape=(n_supp + n_times, n_supp + n_times), + ) + + _, component_labels = connected_components(adj, directed=False) + + # Count components that contain at least one supported UNIT node. + # (Period nodes unreachable from any unit form trivial singletons but + # those would already be caught by the period-level Omega_0 check + # upstream; here we only fail when there are 2+ unit-bearing + # components.) + unit_component_ids = set(int(c) for c in component_labels[:n_supp]) + n_unit_components = len(unit_component_ids) + + if n_unit_components <= 1: + return + + # Build informative error: name first few units per component. + component_units: Dict[int, List[Any]] = {} + for unit_pos, unit_code in enumerate(supported_units_sorted): + comp = int(component_labels[unit_pos]) + component_units.setdefault(comp, []).append(unit_uniques[unit_code]) + component_summary = "; ".join( + f"component {comp_id}: {list(units[:3])}" + + (f" (+{len(units) - 3} more)" if len(units) > 3 else "") + for comp_id, units in list(sorted(component_units.items()))[:3] + ) + raise ValueError( + f"Stage-1 fixed effects unidentified: the Omega_0 bipartite " + f"graph (supported units linked by shared untreated-and-" + f"unexposed periods) splits into {n_unit_components} " + f"disconnected components. Balanced panel and per-unit/per-" + f"period Omega_0 coverage are NECESSARY but not SUFFICIENT for " + f"joint identification — the iterative FE solver returns FE only " + f"up to component-specific constants, and residualization " + f"combines mu from one component with lambda from another, " + f"silently corrupting tau_total and delta_j. Examples: " + f"{component_summary}. To fix, ensure all supported units share " + f"at least one common Omega_0 period (e.g., add a far-away " + f"never-treated unit that observes the full time range)." + ) + + +def _iterative_fe_subset( + y_full: np.ndarray, + unit_codes_full: np.ndarray, + time_codes_full: np.ndarray, + omega_0_mask: np.ndarray, + *, + max_iter: int = _FE_ITER_MAX, + tol: float = _FE_ITER_TOL, +) -> Tuple[np.ndarray, np.ndarray, bool]: + """Stage-1 iterative-alternating-projection FE solver on the Butts subsample. + + Fits ``y[Omega_0] = mu_i + lambda_t + u`` on the untreated-and-unexposed + rows (``Omega_0_mask`` True). Returns FE arrays indexed by code, with + ``NaN`` at positions whose unit / time is not represented in the + subsample (rank-deficient cells). + + Mirrors ``TwoStageDiD._iterative_fe`` structurally but operates on + integer-coded factors via ``np.bincount`` for speed and skips the + survey-weights branch (Wave B MVP). + + Parameters + ---------- + y_full : ndarray of shape (n_rows,) + Outcome vector for ALL observations (Omega_0 + treated/exposed). + unit_codes_full : ndarray of shape (n_rows,) + Integer factor codes per row in ``[0, n_units)``. + time_codes_full : ndarray of shape (n_rows,) + Integer factor codes per row in ``[0, n_times)``. + omega_0_mask : ndarray of shape (n_rows,), bool + True for rows in the stage-1 fit subsample (D_it=0 AND S_it=0). + + Returns + ------- + unit_fe_arr : ndarray of shape (n_units,) + Unit FE indexed by code. ``NaN`` for units absent from Omega_0. + time_fe_arr : ndarray of shape (n_times,) + Time FE indexed by code. ``NaN`` for periods absent from Omega_0. + converged : bool + Whether the iterative solver reached ``tol`` within ``max_iter``. + """ + if omega_0_mask.sum() == 0: + raise ValueError( + "_iterative_fe_subset: Omega_0 (untreated-and-unexposed subsample) " + "is empty. Cannot fit stage-1 fixed effects. Check that some " + "control units have d_it > d_bar (Butts Assumption 5(ii))." + ) + + n_units = int(unit_codes_full.max()) + 1 + n_times = int(time_codes_full.max()) + 1 + + # Operate on the subset only (faster than masking each iteration). + y_sub = y_full[omega_0_mask] + unit_sub = unit_codes_full[omega_0_mask] + time_sub = time_codes_full[omega_0_mask] + n_sub = len(y_sub) + + alpha = np.zeros(n_sub) + beta = np.zeros(n_sub) + converged = False + for _ in range(max_iter): + # beta[t] = mean over rows in time-group t of (y - alpha) + resid = y_sub - alpha + time_sums = np.bincount(time_sub, weights=resid, minlength=n_times) + time_counts = np.bincount(time_sub, minlength=n_times) + time_means = np.where(time_counts > 0, time_sums / np.maximum(time_counts, 1), 0.0) + beta_new = time_means[time_sub] + + # alpha[i] = mean over rows in unit-group i of (y - beta_new) + resid = y_sub - beta_new + unit_sums = np.bincount(unit_sub, weights=resid, minlength=n_units) + unit_counts = np.bincount(unit_sub, minlength=n_units) + unit_means = np.where(unit_counts > 0, unit_sums / np.maximum(unit_counts, 1), 0.0) + alpha_new = unit_means[unit_sub] + + max_change = max( + float(np.max(np.abs(alpha_new - alpha))) if n_sub > 0 else 0.0, + float(np.max(np.abs(beta_new - beta))) if n_sub > 0 else 0.0, + ) + alpha = alpha_new + beta = beta_new + if max_change < tol: + converged = True + break + + # Build FE arrays indexed by code; NaN for unseen units/periods. + unit_fe_arr = np.full(n_units, np.nan, dtype=np.float64) + time_fe_arr = np.full(n_times, np.nan, dtype=np.float64) + # For each code present in the subset, take any row's converged value + # (constant within group at convergence). Sort-by-code to make access + # deterministic. + seen_unit_codes = np.unique(unit_sub) + for u_code in seen_unit_codes: + idx = np.flatnonzero(unit_sub == u_code)[0] + unit_fe_arr[u_code] = alpha[idx] + seen_time_codes = np.unique(time_sub) + for t_code in seen_time_codes: + idx = np.flatnonzero(time_sub == t_code)[0] + time_fe_arr[t_code] = beta[idx] + return unit_fe_arr, time_fe_arr, converged + + +def _residualize_butts( + y_full: np.ndarray, + unit_codes_full: np.ndarray, + time_codes_full: np.ndarray, + unit_fe_arr: np.ndarray, + time_fe_arr: np.ndarray, +) -> np.ndarray: + """Compute ``y_tilde = y - mu_hat[i] - lambda_hat[t]`` for ALL rows. + + Rows whose unit or period has ``NaN`` FE (rank-deficient cells from + stage 1) get ``NaN`` y_tilde and are masked out of stage 2. + """ + mu_per_row = unit_fe_arr[unit_codes_full] + lambda_per_row = time_fe_arr[time_codes_full] + return y_full - mu_per_row - lambda_per_row + + +# ============================================================================= +# Public estimator (skeleton — fit() implemented in Step 3) +# ============================================================================= + + +class SpilloverDiD: + """Ring-indicator spillover-aware DiD (Butts 2021). + + Standalone estimator implementing two-stage Gardner (2022) methodology + with ring-indicator covariates that identify the direct effect on + treated units (``tau_total``) alongside per-ring spillover effects on + near-control units (``delta_j``). Supports both panel non-staggered + timing and Section 5 staggered timing in a single ``fit()`` entry + point — non-staggered is the special case where all treated units + share an onset time. + + Parameters + ---------- + rings : list of float + Sorted distance breakpoints with at least 2 elements. ``K = + len(rings) - 1`` rings are constructed. + d_bar : float, optional + Far-away cutoff (Butts Assumption 5). Defaults to ``max(rings)``; + if explicitly set, must equal ``max(rings)``. Wave B MVP does not + support a ``d_bar`` strictly larger than the outermost ring edge + (a "dead zone" where units satisfy ``rings[-1] < d_i <= d_bar`` + but are in neither a ring nor the far-away group has no clean + methodological interpretation). To use a smaller spillover + bandwidth, shrink the outermost ring edge instead. + vcov_type : str, default="hc1" + Variance estimator. Set to ``"conley"`` and supply + ``conley_coords``/``conley_cutoff_km``/``conley_lag_cutoff`` to + enable Conley spatial-HAC at stage 2 (recommended per paper + Section 3.1). + conley_coords : tuple of (str, str), optional + ``(lat_col, lon_col)`` column names. Used for ring construction + AND for the Conley vcov spatial kernel. + conley_metric : str or callable, default="haversine" + Distance metric used for both ring construction and the Conley + spatial kernel. See :mod:`diff_diff.conley` for callable contract. + conley_cutoff_km : float, optional + Conley spatial-HAC bandwidth. Required when ``vcov_type="conley"``. + conley_lag_cutoff : int, optional + Within-unit Bartlett max lag. Required when ``vcov_type="conley"``. + Use ``0`` to suppress the serial-component sandwich. + cluster : str, optional + Column name for cluster-robust variance, or the combined Conley + cluster product kernel when paired with ``vcov_type="conley"``. + alpha : float, default=0.05 + Significance level for confidence intervals. + anticipation : int, default=0 + Number of pre-treatment periods where effects may occur. Treatment + and ring-membership clocks both shift by ``-anticipation`` so the + stage-1 untreated-and-unexposed subsample correctly excludes + anticipation rows. + event_study : bool, default=False + If ``True``, emit per-event-time × ring coefficients (Butts Table + 2 staggered specification). The result's ``spillover_effects`` + DataFrame uses a ``MultiIndex`` over ``(ring, event_time)``. + horizon_max : int, optional + Maximum absolute event-study horizon. Mirrors + :class:`diff_diff.two_stage.TwoStageDiD`. + rank_deficient_action : {"warn", "error", "silent"}, default="warn" + Action when the stage-2 design is rank-deficient. + + Attributes + ---------- + results_ : SpilloverDiDResults + Populated after :meth:`fit` completes. + is_fitted_ : bool + + Notes + ----- + The implementation uses two-stage Gardner methodology with the + time-varying ``S_it = S_i * 1{t >= t_treat}`` form (paper page 12, + just above Equation 5). Reading the literal unit-static ``(1 - D_it) * + S_i`` from Equation 5 yields a rank-deficient design under TWFE; + Section 5's Table 2 makes the time-varying form explicit. The + diff-diff implementation matches the paper's identification argument + once the ``S_it`` notation is read correctly. + + For non-staggered timing, Gardner identity → stage-2 point estimates + equal a single-stage TWFE with the time-varying spillover regressor. + """ + + def __init__( + self, + *, + rings: List[float], + d_bar: Optional[float] = None, + vcov_type: str = "hc1", + conley_coords: Optional[Tuple[str, str]] = None, + conley_metric: SpilloverMetric = "haversine", + conley_cutoff_km: Optional[float] = None, + conley_lag_cutoff: Optional[int] = None, + cluster: Optional[str] = None, + alpha: float = 0.05, + anticipation: int = 0, + event_study: bool = False, + horizon_max: Optional[int] = None, + rank_deficient_action: str = "warn", + ): + if rank_deficient_action not in ("warn", "error", "silent"): + raise ValueError( + f"rank_deficient_action must be 'warn', 'error', or 'silent', " + f"got '{rank_deficient_action}'" + ) + self.rings = rings + self.d_bar = d_bar + self.vcov_type = vcov_type + self.conley_coords = conley_coords + self.conley_metric = conley_metric + self.conley_cutoff_km = conley_cutoff_km + self.conley_lag_cutoff = conley_lag_cutoff + self.cluster = cluster + self.alpha = alpha + self.anticipation = anticipation + self.event_study = event_study + self.horizon_max = horizon_max + self.rank_deficient_action = rank_deficient_action + self.is_fitted_ = False + self.results_: Optional[Any] = None + + def get_params(self) -> Dict[str, Any]: + return { + "rings": self.rings, + "d_bar": self.d_bar, + "vcov_type": self.vcov_type, + "conley_coords": self.conley_coords, + "conley_metric": self.conley_metric, + "conley_cutoff_km": self.conley_cutoff_km, + "conley_lag_cutoff": self.conley_lag_cutoff, + "cluster": self.cluster, + "alpha": self.alpha, + "anticipation": self.anticipation, + "event_study": self.event_study, + "horizon_max": self.horizon_max, + "rank_deficient_action": self.rank_deficient_action, + } + + def set_params(self, **params: Any) -> "SpilloverDiD": + valid = set(self.get_params().keys()) + for key, value in params.items(): + if key not in valid: + raise ValueError( + f"Unknown parameter: {key!r}. Valid parameters: " f"{sorted(valid)}." + ) + setattr(self, key, value) + return self + + # ------------------------------------------------------------------------- + # Fit-time validators (Step 2) + # ------------------------------------------------------------------------- + + def _validate_spillover_inputs( + self, + data: pd.DataFrame, + treatment: Optional[str], + first_treat: Optional[str], + time: str, + unit: str, + outcome: str, + ) -> None: + """Front-door validation for SpilloverDiD.fit(). + + Runs BEFORE any stage-1 work. Catches malformed estimator state + (rings, d_bar), missing/conflicting timing kwargs (treatment XOR + first_treat), missing required columns, and Conley-specific + prerequisites. Resolves ``self._effective_d_bar`` as a side + effect so subsequent helpers can read it directly. + + Raises + ------ + ValueError + Any malformed input. Error messages name the offending kwarg + and (where applicable) the offending row count. + """ + # 1. rings: sorted list of >= 2 elements, non-negative, strictly increasing. + if not isinstance(self.rings, (list, tuple, np.ndarray)): + raise ValueError( + f"rings must be a list/tuple/array of distance breakpoints; " + f"got {type(self.rings).__name__}." + ) + rings_arr = np.asarray(self.rings, dtype=np.float64) + if rings_arr.ndim != 1 or rings_arr.size < 2: + raise ValueError( + "rings must contain at least 2 breakpoints; " + f"got {len(self.rings)} ({list(self.rings)})." + ) + if (rings_arr < 0).any(): + raise ValueError(f"rings must be non-negative; got {list(self.rings)}.") + if (np.diff(rings_arr) <= 0).any(): + raise ValueError(f"rings must be strictly increasing; got {list(self.rings)}.") + if rings_arr[0] != 0: + raise ValueError( + f"rings[0] must equal 0 to cover treated locations " + f"(d_it = 0 must belong to Ring 1); got rings[0] = " + f"{rings_arr[0]}. Rows with 0 <= d_it < rings[0] would " + "be flagged as exposed (S_it = 1) but receive zero " + "spillover regressors at stage 2, silently biasing the " + "estimator. To exclude very-close pairs, model that with " + "an explicit innermost ring covering [0, rings[0])." + ) + + # 2. d_bar: defaults to rings[-1]; if set explicitly must equal rings[-1] + # (avoid the dead zone where d_i in (rings[-1], d_bar] is neither + # in any ring nor far-away). + if self.d_bar is None: + self._effective_d_bar = float(rings_arr[-1]) + else: + if not np.isfinite(self.d_bar) or self.d_bar <= 0: + raise ValueError(f"d_bar must be positive and finite; got {self.d_bar}.") + if not np.isclose(self.d_bar, rings_arr[-1]): + raise ValueError( + f"d_bar ({self.d_bar}) must equal max(rings) ({rings_arr[-1]}); " + "to vary d_bar, vary the rings breakpoints (the outermost " + "edge is implicitly the spillover cutoff). Setting d_bar " + "different from rings[-1] would create a 'dead zone' " + "where units in (rings[-1], d_bar] are neither in any " + "ring nor in the far-away control group." + ) + self._effective_d_bar = float(self.d_bar) + + # 3. Exactly ONE of treatment / first_treat must be supplied. + if treatment is None and first_treat is None: + raise ValueError( + "Exactly one of `treatment` (binary D_it column) or " + "`first_treat` (per-unit onset-time column) must be supplied." + ) + if treatment is not None and first_treat is not None: + raise ValueError( + "Provide either `treatment` or `first_treat`, not both. " + "`treatment` is auto-converted to `first_treat` internally." + ) + + # 4. Required columns exist in data (treat outcome the same way as + # other required columns — front-door error rather than late + # KeyError when `data[outcome]` is dereferenced). + required = [time, unit, outcome] + if treatment is not None: + required.append(treatment) + if first_treat is not None: + required.append(first_treat) + missing = [c for c in required if c not in data.columns] + if missing: + raise ValueError(f"Missing required columns in data: {missing}.") + + # 4a-bis. Outcome must be finite per-row. Non-finite outcomes + # propagate into stage-1 FE estimation and surface as non- + # convergence warnings + late solver failures rather than a + # targeted input error. Reject up front. + outcome_arr = np.asarray(data[outcome].values, dtype=np.float64) + if not np.isfinite(outcome_arr).all(): + n_bad = int((~np.isfinite(outcome_arr)).sum()) + raise ValueError( + f"outcome column '{outcome}' contains {n_bad} non-finite " + "value(s) (NaN / Inf). SpilloverDiD requires finite outcomes " + "for stage-1 FE estimation; impute or drop missing rows " + "before fitting." + ) + + # 4a-ter. Identifier columns (unit, time, optionally first_treat + # when user-supplied) must not contain NaN. Missing identifiers + # would fall through to opaque numpy / pandas errors (e.g. + # "negative elements" from np.bincount) rather than a targeted + # ValueError. Reject up front. + for id_col in (unit, time): + id_nan_mask = data[id_col].isna() + if bool(id_nan_mask.any()): + n_nan = int(id_nan_mask.sum()) + raise ValueError( + f"identifier column '{id_col}' contains {n_nan} " + "missing value(s). SpilloverDiD requires valid " + "unit / time identifiers on every row; drop or " + "impute missing-identifier rows before fitting." + ) + # `first_treat` is checked only when user-supplied; the auto- + # generated path produces a clean column. + if first_treat is not None and first_treat in data.columns: + ft_nan_mask = data[first_treat].isna() + if bool(ft_nan_mask.any()): + n_nan = int(ft_nan_mask.sum()) + raise ValueError( + f"first_treat column '{first_treat}' contains {n_nan} " + "missing value(s). Use np.inf (or 0) for never-treated " + "units; do not leave NaN." + ) + + # 4b. One-row-per-(unit, time) cell panel contract. Duplicate cells + # would silently re-weight stage-1 FE estimation AND stage-2 OLS + # without any warning. Reject up front. + cell_counts = data.groupby([unit, time]).size() + dups = cell_counts[cell_counts > 1] + if len(dups) > 0: + sample = list(dups.index[:5]) + suffix = f" (and {len(dups) - 5} more)" if len(dups) > 5 else "" + raise ValueError( + f"{len(dups)} duplicate (unit, time) cell(s) detected " + f"(e.g. {sample}{suffix}). SpilloverDiD requires " + "one-row-per-(unit, time) panel data — duplicate cells " + "would silently re-weight both the stage-1 FE fit and the " + "stage-2 OLS. Aggregate to unique cells before fitting." + ) + + # 4c. Balanced-panel contract for the Wave B MVP. An unbalanced + # panel where the stage-1 (unit, time) FE bipartite graph induced + # by Omega_0 isn't connected produces unidentified residuals on + # treated rows. The exact-graph-connectivity check is queued as + # a follow-up; the MVP simply rejects panels where some unit + # doesn't observe every period. + n_unique_times = data[time].nunique() + unit_period_counts = data.groupby(unit)[time].nunique() + underbalanced = unit_period_counts[unit_period_counts < n_unique_times] + if len(underbalanced) > 0: + sample = list(underbalanced.index[:5]) + suffix = f" (and {len(underbalanced) - 5} more)" if len(underbalanced) > 5 else "" + raise ValueError( + f"Unbalanced panel: {len(underbalanced)} unit(s) do not " + f"observe every period (panel has {n_unique_times} unique " + f"periods, affected units e.g. {sample}{suffix}). Wave B " + "MVP requires a balanced panel — an unbalanced (unit, time) " + "Omega_0 bipartite graph can produce unidentified residuals " + "for some treated rows even when every unit and every " + "period has at least one Omega_0 row. Balance the panel " + "(impute missing cells or drop affected units) before " + "fitting. Graph-connectivity-based identification is " + "queued as a follow-up extension." + ) + + # 5a. conley_coords is required ALWAYS — ring construction dereferences + # it on every fit() path, regardless of vcov_type. Validate up front + # rather than letting downstream code fail with AssertionError/KeyError. + if self.conley_coords is None: + raise ValueError( + "SpilloverDiD requires `conley_coords=(lat_col, lon_col)` " + "for ring construction, regardless of vcov_type." + ) + if not isinstance(self.conley_coords, (list, tuple)) or len(self.conley_coords) != 2: + raise ValueError( + "conley_coords must be a 2-tuple (lat_col, lon_col); " + f"got {self.conley_coords!r}." + ) + # Within-unit coord constancy: ring construction collapses coords to + # one row per unit via drop_duplicates(subset=[unit]). If a unit's + # lat/lon varies across rows the first observed value is silently + # used; reject up front rather than silently misclassify spillover + # exposure. + coord_cols = list(self.conley_coords) + if unit in data.columns and all(c in data.columns for c in coord_cols): + per_unit_unique = data.groupby(unit)[coord_cols].nunique() + non_constant = per_unit_unique[(per_unit_unique > 1).any(axis=1)] + if len(non_constant) > 0: + sample = non_constant.index.tolist()[:5] + suffix = f" (and {len(non_constant) - 5} more)" if len(non_constant) > 5 else "" + raise ValueError( + f"{len(non_constant)} unit(s) have non-constant " + f"conley_coords ({coord_cols}) across rows (e.g. {sample}" + f"{suffix}). SpilloverDiD requires within-unit-constant " + "coordinates — ring construction collapses coords per " + "unit via drop_duplicates. Aggregate to a single (lat, " + "lon) per unit (e.g. via the unit's geographic centroid) " + "before fitting, or fix the data so coords are constant." + ) + for c in self.conley_coords: + if c not in data.columns: + raise ValueError(f"conley_coords column '{c}' not in data.") + # Coord finiteness check (per-row). + coord_vals = data[list(self.conley_coords)].values + coord_arr = np.asarray(coord_vals, dtype=np.float64) + if not np.isfinite(coord_arr).all(): + n_nonfinite = int((~np.isfinite(coord_arr)).any(axis=1).sum()) + raise ValueError( + f"conley_coords contain non-finite values in {n_nonfinite} row(s); " + "coordinates must be finite for distance computation." + ) + # Haversine lat/lon domain check: applies on EVERY vcov path (not just + # vcov_type='conley') because ring construction always uses + # conley_metric for distance computation. Out-of-range coords silently + # produce wrong ring assignment otherwise. + if self.conley_metric == "haversine": + lat_arr = coord_arr[:, 0] + lon_arr = coord_arr[:, 1] + if (lat_arr < -90.0).any() or (lat_arr > 90.0).any(): + bad_rows = int(((lat_arr < -90.0) | (lat_arr > 90.0)).sum()) + raise ValueError( + f"conley_coords latitude column '{coord_cols[0]}' contains " + f"{bad_rows} row(s) outside [-90, 90] degrees. Haversine " + "metric requires geographic lat/lon coords; if your coords " + "are already projected (planar), pass conley_metric='euclidean'." + ) + if (lon_arr < -180.0).any() or (lon_arr > 180.0).any(): + bad_rows = int(((lon_arr < -180.0) | (lon_arr > 180.0)).sum()) + raise ValueError( + f"conley_coords longitude column '{coord_cols[1]}' contains " + f"{bad_rows} row(s) outside [-180, 180] degrees. Haversine " + "metric requires geographic lat/lon coords; if your coords " + "are already projected (planar), pass conley_metric='euclidean'." + ) + + # 5b. cluster column existence + NaN check — applies on every vcov + # path, not just conley. Missing cluster ids would produce wrong + # SEs (NaN counted as its own cluster by np.unique() but dropped + # by pandas groupby() in the cluster meat). + if self.cluster is not None: + if self.cluster not in data.columns: + raise ValueError(f"cluster column '{self.cluster}' not in data.") + cluster_nan_mask = data[self.cluster].isna() + if bool(cluster_nan_mask.any()): + n_nan = int(cluster_nan_mask.sum()) + raise ValueError( + f"cluster column '{self.cluster}' contains {n_nan} " + "missing value(s). NaN cluster ids would silently " + "produce wrong clustered SEs (np.unique counts NaN as " + "its own cluster but pandas groupby drops it from the " + "cluster meat). Drop or impute missing cluster rows " + "before fitting." + ) + + # 5c. Conley-specific kwargs (only required when vcov_type='conley'). + if self.vcov_type == "conley": + if self.conley_cutoff_km is None or not ( + np.isfinite(self.conley_cutoff_km) and self.conley_cutoff_km > 0 + ): + raise ValueError( + "vcov_type='conley' requires conley_cutoff_km > 0 (finite); " + f"got {self.conley_cutoff_km}." + ) + if self.conley_lag_cutoff is None or self.conley_lag_cutoff < 0: + raise ValueError( + "vcov_type='conley' requires conley_lag_cutoff >= 0 (integer); " + f"got {self.conley_lag_cutoff}." + ) + + # 6. At least one treated unit must exist. + if treatment is not None: + n_treated_obs = int((data[treatment] == 1).sum()) + if n_treated_obs == 0: + raise ValueError( + f"No treated observations found (column '{treatment}' " + "is all 0/NaN). SpilloverDiD requires at least one treated unit." + ) + else: + ft_finite = np.isfinite(data[first_treat].astype(float).values) # type: ignore[arg-type] + n_treated_units = int( + pd.Series(ft_finite & (data[first_treat].astype(float).values != 0)).any() # type: ignore[index] + ) + if not n_treated_units: + raise ValueError( + f"No treated units found (column '{first_treat}' is " + "all 0 / inf / NaN). SpilloverDiD requires at least one " + "treated unit." + ) + + def _validate_far_away_exists( + self, + d_array: np.ndarray, + is_control_array: np.ndarray, + ) -> int: + """Verify Butts Assumption 5(ii): at least one (D=0, d > d_bar) observation. + + Parameters + ---------- + d_array : ndarray + Per-unit or per-row distances (caller chooses; the check is + count-based, not granularity-sensitive). + is_control_array : ndarray, bool + Aligned mask: True where the observation belongs to a control + unit (D_i = 0 for static, D_it = 0 for staggered). + + Returns + ------- + n_far_away : int + Number of far-away control observations. + + Raises + ------ + ValueError + No (D=0, d > d_bar) observations exist; Assumption 5(ii) fails. + """ + d_bar = self._effective_d_bar + far_away_mask = (d_array > d_bar) & is_control_array + n_far_away = int(far_away_mask.sum()) + if n_far_away < 1: + raise ValueError( + "No far-away control observations: every control unit has " + f"d_i <= d_bar = {d_bar}. Butts (2021) Assumption 5(ii) " + "requires the sample to contain control units strictly " + "further than d_bar from any treated unit. Either reduce " + "d_bar (via the outermost ring breakpoint), expand the sample, " + "or verify the coords/metric configuration." + ) + return n_far_away + + def fit( + self, + data: pd.DataFrame, + *, + outcome: str, + unit: str, + time: str, + treatment: Optional[str] = None, + first_treat: Optional[str] = None, + covariates: Optional[List[str]] = None, + survey_design: object = None, + ) -> SpilloverDiDResults: + """Fit the two-stage Gardner DiD with ring-indicator covariates. + + Methodology (Butts 2021 Section 5 + Gardner 2022): + + 1. Compute per-row spillover indicators from ``conley_coords``. + 2. Build stage-1 subsample ``Omega_0 = {D_it=0 AND S_it=0}`` + (untreated AND unexposed) — Butts' clean control group. + 3. Stage 1: fit ``Y_it = mu_i + lambda_t + u`` on ``Omega_0``. + 4. Residualize: ``Y_tilde = Y - mu_hat - lambda_hat`` for ALL rows. + 5. Stage 2: regress ``Y_tilde`` on ``[D_it, (1-D_it)*Ring_{it,j}]`` via + :func:`solve_ols`, threading the configured ``vcov_type``. + 6. Wrap as :class:`SpilloverDiDResults`. + + Notes + ----- + Wave B MVP: stage-2 variance is the standard solve_ols estimator + (HC1 / Conley / cluster). The Gardner GMM sandwich first-stage + uncertainty correction is NOT applied (planned follow-up; see + TODO and plan Risks #2). Variance is therefore approximate (likely + underestimated by a few percent in typical settings). + """ + if survey_design is not None: + raise NotImplementedError( + "SpilloverDiD does not yet support survey_design= ; planned " + "as a follow-up extension. See TODO.md." + ) + if self.event_study: + raise NotImplementedError( + "SpilloverDiD does not yet support event_study=True ; the " + "per-event-time × ring decomposition (Butts Table 2) is " + "planned as a follow-up extension. The base " + "(event_study=False) aggregate spec is fully supported and " + "handles both non-staggered and staggered timing." + ) + if self.horizon_max is not None: + raise NotImplementedError( + "SpilloverDiD does not yet support horizon_max= (used only " + "in event-study mode); planned as a follow-up extension." + ) + # Validate `anticipation` up front: must be a non-negative integer. + # Accepting fractional or negative values would silently shift + # treatment timing and ring exposure beyond what the estimator's + # identification contract supports. + if not isinstance(self.anticipation, (int, np.integer)) or self.anticipation < 0: + raise ValueError( + f"anticipation must be a non-negative integer; got " + f"{self.anticipation!r} (type {type(self.anticipation).__name__})." + ) + if covariates is not None and len(covariates) > 0: + raise NotImplementedError( + "SpilloverDiD does not yet support covariates= in Wave B MVP. " + "The Gardner-style two-stage pattern requires covariate " + "effects to be estimated on the untreated-and-unexposed " + "subsample at stage 1 and subtracted from Y before stage 2 — " + "appending them only at stage 2 (without stage-1 " + "residualization) would silently bias tau_total / delta_j on " + "panels with time-varying covariates. The full covariate " + "path mirroring TwoStageDiD._fit_untreated_model is queued as " + "a follow-up extension. See TODO.md." + ) + if self.vcov_type in ("hc2", "hc2_bm"): + raise NotImplementedError( + f"SpilloverDiD does not yet support vcov_type='{self.vcov_type}'. " + "The current stage-2 inference uses a generic residual df " + "(n - effective_rank) for t-distribution lookups, but " + "hc2 / hc2_bm require per-coefficient Bell-McCaffrey / CR2 " + "degrees of freedom for correct p-values and CIs. Routing " + "stage 2 through LinearRegression (which supplies the " + "per-coefficient DOF metadata) is queued as a follow-up " + "extension. Use vcov_type='hc1', 'classical', 'conley', or " + "leave default; combine with cluster= for CR1." + ) + + # Step 0: defensive copy so the caller's DataFrame is never mutated. + data = data.copy(deep=False) + + # Step 0b: coerce `time` to numeric BEFORE any structural validation. + # The validator's duplicate-cell and balanced-panel checks depend on + # period IDENTITY; mixed raw encodings like ['0', 0, '1', 1] would + # pass validation but collapse to duplicate periods after coercion. + # Coercing first ensures validation sees the actual numeric labels. + if time in data.columns: + try: + data = data.assign(**{time: pd.to_numeric(data[time])}) + except (TypeError, ValueError) as exc: + raise ValueError( + f"time column '{time}' must be numeric (or string-coercible " + f"to numeric). Got: {exc}. Encode periods as integers / " + "floats before passing to SpilloverDiD." + ) from exc + # User-supplied first_treat must also be coerced BEFORE validation + # so the NaN check and identity-based checks see the actual labels. + # Auto-generated `_spillover_first_treat` (from binary D) doesn't + # exist yet — it's created later by `_convert_treatment_to_first_treat`. + if first_treat is not None and first_treat in data.columns: + try: + data = data.assign(**{first_treat: pd.to_numeric(data[first_treat])}) + except (TypeError, ValueError) as exc: + raise ValueError( + f"first_treat column '{first_treat}' must be numeric (or " + f"string-coercible to numeric). Got: {exc}. Encode onset " + "times as integers / floats (or np.inf for never-treated) " + "before passing to SpilloverDiD." + ) from exc + + # Step 1: front-door validation (rings, d_bar, timing-kwargs XOR, + # coords, panel structure — all on COERCED time/first_treat labels). + self._validate_spillover_inputs(data, treatment, first_treat, time, unit, outcome) + + # Step 2: convert binary treatment to per-unit first_treat if needed. + # Track whether `first_treat` was AUTO-GENERATED (from a binary D + # column) vs USER-SUPPLIED (Gardner convention). The auto-generated + # column uses ONLY np.inf for never-treated (no 0-as-never-treated + # sentinel); preserving this distinction avoids silently + # reclassifying baseline-treated units (D=1 at t=0) as never-treated. + treatment_auto_converted = treatment is not None + if treatment is not None: + data, first_treat = _convert_treatment_to_first_treat(data, treatment, time, unit) + assert first_treat is not None # validator guarantees this + + # Step 3: factorize unit/time → integer codes (mirrors TwoStageDiD). + unit_vals = data[unit].values + time_vals = data[time].values + unit_codes_full, unit_uniques = pd.factorize(pd.Series(unit_vals), sort=True) + time_codes_full, time_uniques = pd.factorize(pd.Series(time_vals), sort=True) + + # Step 4: extract treatment onsets per unit; detect staggered. + first_treat_by_unit = _extract_treatment_onsets( + data, + first_treat, + unit, + treat_zero_as_never_treated=not treatment_auto_converted, + ) + finite_onsets = {ft for ft in first_treat_by_unit.values() if np.isfinite(ft)} + if not finite_onsets: + raise ValueError( + "No treated units found (all first_treat values are inf or 0). " + "SpilloverDiD requires at least one treated unit." + ) + is_staggered = len(finite_onsets) > 1 + + # Apply anticipation shift to onsets used for ring construction AND + # for the D_it indicator (treatment-effective onset). + effective_onsets = { + uid: (ft - self.anticipation if np.isfinite(ft) else ft) + for uid, ft in first_treat_by_unit.items() + } + + # Step 5: compute per-row d_it. For non-staggered (single common + # onset), use the cheaper static helper that builds the pairwise + # distance matrix once; for staggered, use the per-cohort helper + # that handles time-varying ring membership. + assert self.conley_coords is not None # validator-guaranteed + + # If conley_metric is a user callable, validate it against the full + # 6-check contract (shape / finite / non-negative / symmetric / + # zero-diagonal) on the per-unit (n, n) self-call BEFORE using it + # for ring construction. Without this, a callable with positive + # self-distance silently corrupts ring assignment (treated units + # at their own location should have d=0 → fall in Ring_1; positive + # self-distance pushes them out into a different ring). + if callable(self.conley_metric): + unit_coords_for_validation = ( + data[list(self.conley_coords)].drop_duplicates().values.astype(np.float64) + ) + _validate_callable_metric_result( + self.conley_metric(unit_coords_for_validation, unit_coords_for_validation), + unit_coords_for_validation.shape[0], + ) + + if is_staggered: + d_it_per_row, _, _ = _compute_nearest_treated_distance_staggered( + data, + unit=unit, + time=time, + coords=self.conley_coords, + metric=self.conley_metric, + first_treat_by_unit=effective_onsets, + ) + else: + # Non-staggered: single common onset. Build d_i per unit once, + # then broadcast to per-row AND zero out pre-treatment rows + # (matching the staggered helper's inf-at-pre-treatment + # convention so downstream ring + Omega_0 logic is timing- + # agnostic). + ever_treated_ids = np.array( + [uid for uid, ft in first_treat_by_unit.items() if np.isfinite(ft)], + dtype=object, + ) + d_i_per_unit, unit_index_static = _compute_nearest_treated_distance_static( + data, + unit=unit, + coords=self.conley_coords, + metric=self.conley_metric, + treated_unit_ids=ever_treated_ids, + # Pass `d_bar` as the cutoff so the cKDTree sparse path + # auto-activates when n_units > _CONLEY_SPARSE_N_THRESHOLD + # for built-in metrics. Units beyond d_bar get d_i = inf, + # which the downstream ring builder treats as far-away + # controls — same as the dense-path semantics. + cutoff_km=self._effective_d_bar, + ) + unit_to_d = {uid: float(d_i_per_unit[idx]) for idx, uid in enumerate(unit_index_static)} + d_it_per_row = np.array([unit_to_d.get(u, np.inf) for u in unit_vals]) + # Pre-treatment rows have d_it=inf (no unit treated yet). + shared_onset = next(iter(finite_onsets)) + shared_effective_onset = shared_onset - self.anticipation + d_it_per_row = np.where( + np.asarray(time_vals, dtype=np.float64) < shared_effective_onset, + np.inf, + d_it_per_row, + ) + + # Step 6: build ring indicators per row (Butts Eq 6 time-varying form). + ring_masks = _build_ring_indicators(d_it_per_row, list(self.rings)) + K = ring_masks.shape[1] + + # Step 7: compute D_it per row (with anticipation shift). + D_it = np.zeros(len(data), dtype=np.float64) + for u_id, eff_ft in effective_onsets.items(): + if np.isfinite(eff_ft): + rows = (unit_vals == u_id) & (np.asarray(time_vals) >= eff_ft) + D_it[rows] = 1.0 + + # Step 7b: verify at least one observation is treated AFTER applying + # the anticipation shift. If all first_treat values are > max(time) + # in the panel (e.g. an "anticipation" of treatment that hasn't + # arrived yet), D_it is all zeros and the stage-2 design has no + # treatment variation. Fail fast with a clear identification error + # rather than crashing inside solve_ols. + if D_it.sum() == 0: + max_time = float(np.max(np.asarray(time_vals, dtype=np.float64))) + raise ValueError( + "No observation is treated in-sample after applying " + f"anticipation shift of {self.anticipation}. The earliest " + "effective onset is later than the latest observed period " + f"({max_time}), so D_it = 0 everywhere and tau_total is " + "unidentified. Either include post-onset periods in the " + "panel, reduce the anticipation lead, or verify the " + "first_treat column." + ) + + # Step 8: compute S_it = 1{d_it <= d_bar}. Treated-self rows have + # d_it=0 → S_it=1 (Omega_0 excludes them; they're treated anyway). + S_it = (d_it_per_row <= self._effective_d_bar).astype(np.float64) + + # Step 9: validate far-away controls (Butts Assumption 5(ii)). + # Use CURRENT-period untreated status, not never-treated-only. The + # paper defines Omega_0 row-wise as {D_it = 0 AND S_it = 0}, so + # not-yet-treated observations of eventually-treated units can also + # contribute to the far-away identifying group. This matters for + # all-eventually-treated staggered designs (no never-treated units). + is_control_row_now = D_it == 0 + n_far_away_obs = self._validate_far_away_exists(d_it_per_row, is_control_row_now) + + # Step 10: Butts Omega_0 mask = (D_it=0 AND S_it=0). + omega_0_mask = (D_it == 0) & (S_it == 0) + + # Step 10b: row-level Omega_0 identification check. + # + # Two regimes (round-16 codex review split): + # - PERIOD-level unsupported (no Omega_0 row at some t): time FE + # structurally unidentified. Dropping the period would remove + # ALL units' observations at that t, including the far-away + # rows needed for identification. Hard error. + # - UNIT-level unsupported (no Omega_0 row for some i): warn- + # and-drop. Unit FE for that i is NaN, residualization writes + # NaN on those rows, and the downstream finite_mask path at + # Step 14 excludes them from stage 2. Mirrors `TwoStageDiD`'s + # always-treated unit handling (`two_stage.py:294-336`) and + # Gardner's framework, which identifies effects from supported + # observations rather than requiring every unit estimable. + unit_codes_arr = np.asarray(unit_codes_full) + time_codes_arr = np.asarray(time_codes_full) + units_in_omega_0 = set(unit_codes_arr[omega_0_mask].tolist()) + times_in_omega_0 = set(time_codes_arr[omega_0_mask].tolist()) + all_unit_codes = set(unit_codes_arr.tolist()) + all_time_codes = set(time_codes_arr.tolist()) + unsupported_units = sorted(all_unit_codes - units_in_omega_0) + unsupported_periods = sorted(all_time_codes - times_in_omega_0) + + if unsupported_periods: + affected = [time_uniques[c] for c in unsupported_periods[:5]] + suffix = ( + f" (and {len(unsupported_periods) - 5} more)" + if len(unsupported_periods) > 5 + else "" + ) + raise ValueError( + f"Stage-1 fixed effects unidentified: " + f"{len(unsupported_periods)} period(s) have NO untreated-and-" + f"unexposed (Omega_0) rows — their time FE is unidentified. " + f"Examples: {affected}{suffix}. The Butts subsample " + "Omega_0 = {D_it = 0 AND S_it = 0} must contain at least one " + "row per period that appears in the data. Consider " + "tightening d_bar (so fewer rows are flagged as exposed " + "S_it = 1) or expanding the sample to include never-treated " + "or pre-treatment observations for the affected periods." + ) + + if unsupported_units: + affected = [unit_uniques[c] for c in unsupported_units[:5]] + suffix = ( + f" (and {len(unsupported_units) - 5} more)" if len(unsupported_units) > 5 else "" + ) + warnings.warn( + f"SpilloverDiD: {len(unsupported_units)} unit(s) have NO " + f"untreated-and-unexposed (Omega_0) rows — their unit FE " + f"is unidentified and their rows will be excluded from " + f"stage 2 estimation. Examples: {affected}{suffix}. To " + f"include these units, expand the sample to provide pre-" + f"treatment or untreated observations for them, or tighten " + f"d_bar so fewer rows are flagged as exposed (S_it = 1).", + UserWarning, + stacklevel=2, + ) + + # Step 10c: connected-component check on the Omega_0 bipartite graph. + # + # Stage 1's iterative FE solver identifies (mu_i, lambda_t) only up + # to component-specific constants per connected component of the + # bipartite graph (supported units ↔ periods, edge = Omega_0 row). + # If the graph splits into K > 1 components, _residualize_butts then + # combines mu_i from one component with lambda_t from another, + # silently corrupting y_tilde and downstream tau_total / delta_j. + # Balanced panel + per-unit/per-period Omega_0 coverage is NECESSARY + # but not SUFFICIENT — connectivity is the load-bearing + # identification condition for stage 1. + _check_omega_0_connectivity( + omega_0_mask=omega_0_mask, + unit_codes_arr=unit_codes_arr, + time_codes_arr=time_codes_arr, + units_in_omega_0=units_in_omega_0, + n_times=len(time_uniques), + unit_uniques=unit_uniques, + ) + + # Step 11: stage 1 — fit FE on Omega_0. + y_full = np.asarray(data[outcome].values, dtype=np.float64) + unit_fe_arr, time_fe_arr, converged = _iterative_fe_subset( + y_full, + np.asarray(unit_codes_full), + np.asarray(time_codes_full), + omega_0_mask, + ) + if not converged: + warnings.warn( + "SpilloverDiD stage-1 iterative FE solver did not converge " + f"within {_FE_ITER_MAX} iterations (tol={_FE_ITER_TOL}). " + "Results may be unreliable.", + UserWarning, + stacklevel=2, + ) + stage1_n_obs = int(omega_0_mask.sum()) + + # Step 12: residualize ALL observations. + y_tilde = _residualize_butts( + y_full, + np.asarray(unit_codes_full), + np.asarray(time_codes_full), + unit_fe_arr, + time_fe_arr, + ) + + # Mask rank-deficient (NaN y_tilde) rows: rather than zero them out + # (which leaves them in the sample for HC1/CR1 n/(n-k) corrections), + # we SUBSET stage-2 arrays to the finite rows before solve_ols. This + # ensures the SE formulas use the actual estimation sample size. + finite_mask = np.isfinite(y_tilde) + n_nan = int((~finite_mask).sum()) + if n_nan > 0: + warnings.warn( + f"SpilloverDiD: {n_nan} observation(s) excluded from stage 2 " + "due to rank-deficient stage-1 FE estimates (unit or period " + "absent from the untreated-and-unexposed subsample).", + UserWarning, + stacklevel=2, + ) + + # Step 13: build stage-2 design [D_it, (1-D_it)*Ring_{it,j}]. + ring_covariates = np.zeros((len(data), K), dtype=np.float64) + for j in range(K): + ring_covariates[:, j] = (1.0 - D_it) * ring_masks[:, j].astype(np.float64) + + X_2 = np.column_stack([D_it.reshape(-1, 1), ring_covariates]) + + ring_labels = [_ring_label(list(self.rings), j) for j in range(K)] + col_names_all = ["treatment"] + [f"_spillover_{lab}" for lab in ring_labels] + + # Step 14: subset arrays to the estimation sample (finite y_tilde rows). + # Apply to design, outcome, cluster ids, AND the Conley spatial/temporal + # auxiliary arrays so the HC1/CR1/Conley sample-size adjustments use the + # correct n. + cluster_ids_full = ( + np.asarray(data[self.cluster].values) if self.cluster is not None else None + ) + if n_nan > 0: + X_2_fit = X_2[finite_mask] + y_tilde_fit = y_tilde[finite_mask] + cluster_ids_fit = ( + cluster_ids_full[finite_mask] if cluster_ids_full is not None else None + ) + time_vals_fit = np.asarray(time_vals)[finite_mask] + unit_vals_fit = np.asarray(unit_vals)[finite_mask] + else: + X_2_fit = X_2 + y_tilde_fit = y_tilde + cluster_ids_fit = cluster_ids_full + time_vals_fit = np.asarray(time_vals) + unit_vals_fit = np.asarray(unit_vals) + + # Step 15: stage-2 OLS with configured vcov via solve_ols. + solve_kwargs: Dict[str, Any] = { + "return_vcov": True, + "rank_deficient_action": self.rank_deficient_action, + "column_names": col_names_all, + "vcov_type": self.vcov_type, + "cluster_ids": cluster_ids_fit, + } + if self.vcov_type == "conley": + coord_array_full = np.asarray(data[list(self.conley_coords)].values, dtype=np.float64) + coord_array_fit = coord_array_full[finite_mask] if n_nan > 0 else coord_array_full + solve_kwargs.update( + { + "conley_coords": coord_array_fit, + "conley_cutoff_km": self.conley_cutoff_km, + "conley_metric": self.conley_metric, + "conley_time": time_vals_fit, + "conley_unit": unit_vals_fit, + "conley_lag_cutoff": self.conley_lag_cutoff, + } + ) + + coef, residuals, vcov = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs) # type: ignore[misc] + + # Step 16: extract coefficients and inference. + tau_total = float(coef[0]) + # Degrees of freedom for t-distribution inference. Use the EFFECTIVE + # rank after solve_ols drops rank-deficient (NaN) coefficient + # columns, NOT the raw column count. On rank-deficient stage-2 + # fits (e.g. an empty ring covariate dropped by solve_ols's QR), + # using raw `X_2_fit.shape[1]` would understate df_resid and + # silently inflate p-values and CI widths. + n_obs_eff = int(finite_mask.sum()) + k_effective = int(np.isfinite(coef).sum()) + df_resid = n_obs_eff - k_effective + if df_resid <= 0: + # Degenerate: no residual degrees of freedom. Force NaN + # inference by setting df_resid = 0 (safe_inference treats + # df = 0 as no usable degrees of freedom, returning NaN for + # t-stat / p-value / CI). Distinct from df_resid = None + # which would fall through to a normal-distribution + # approximation — misleading on a degenerate sample. + warnings.warn( + f"SpilloverDiD stage-2 residual df = {df_resid} (n_obs=" + f"{n_obs_eff}, effective_rank={k_effective}). Inference " + "(t-stat, p-value, CI) will be NaN.", + UserWarning, + stacklevel=2, + ) + df_resid = 0 + + # Clamp negative diagonals to 0 before sqrt: indefinite Conley or + # near-singular sandwich variances can produce numerically tiny + # negative values that would otherwise NaN the entire inference + # row. Matches the sibling-estimator convention + # (two_stage.py:1183, estimators.py:606, stacked_did.py:515). + tau_se = ( + float(np.sqrt(max(vcov[0, 0], 0.0))) + if vcov is not None and np.isfinite(vcov[0, 0]) + else float("nan") + ) + tau_t, tau_p, tau_ci = safe_inference(tau_total, tau_se, alpha=self.alpha, df=df_resid) + + # Per-ring inference. + ring_rows = [] + for j in range(K): + idx = 1 + j # 0 is treatment; rings follow. + coef_j = float(coef[idx]) + se_j = ( + float(np.sqrt(max(vcov[idx, idx], 0.0))) + if vcov is not None and np.isfinite(vcov[idx, idx]) + else float("nan") + ) + t_j, p_j, ci_j = safe_inference(coef_j, se_j, alpha=self.alpha, df=df_resid) + ring_rows.append( + { + "ring": ring_labels[j], + "coef": coef_j, + "se": se_j, + "t_stat": t_j, + "p_value": p_j, + "ci_low": ci_j[0], + "ci_high": ci_j[1], + } + ) + spillover_df = pd.DataFrame(ring_rows).set_index("ring") if ring_rows else None + + # Step 16: counts for the result class. + n_units_ever_in_ring: Dict[str, int] = {} + for j in range(K): + in_ring_units = data.loc[ring_masks[:, j], unit].nunique() + n_units_ever_in_ring[ring_labels[j]] = int(in_ring_units) + + # Step 17: assemble SpilloverDiDResults. n_obs / n_treated / n_control + # reflect the actual stage-2 estimation sample (after dropping NaN + # y_tilde rows), matching solve_ols's HC1/CR1 sample-size adjustments. + D_it_fit = D_it[finite_mask] if n_nan > 0 else D_it + + # Populate coefficients dict with ALL stage-2 coefficients (treatment + # + K rings) keyed by their col_names_all labels so consumers can + # align names to vcov rows/cols. "ATT" is exposed as an alias for the + # treatment slot to match the sibling-estimator convention. + coefficients_full: Dict[str, float] = {} + for i, name in enumerate(col_names_all): + val = float(coef[i]) if np.isfinite(coef[i]) else float("nan") + coefficients_full[name] = val + coefficients_full["ATT"] = tau_total + + result = SpilloverDiDResults( + att=tau_total, + se=tau_se, + t_stat=tau_t, + p_value=tau_p, + conf_int=tau_ci, + n_obs=n_obs_eff, + n_treated=int(D_it_fit.sum()), + n_control=int(len(D_it_fit) - D_it_fit.sum()), + alpha=self.alpha, + coefficients=coefficients_full, + vcov=vcov, + residuals=residuals, + r_squared=None, + inference_method="analytical", + n_bootstrap=None, + n_clusters=( + int(len(np.unique(cluster_ids_fit))) if cluster_ids_fit is not None else None + ), + vcov_type=self.vcov_type, + cluster_name=self.cluster, + conley_lag_cutoff=(self.conley_lag_cutoff if self.vcov_type == "conley" else None), + spillover_effects=spillover_df, + ring_breakpoints=list(self.rings), + d_bar=self._effective_d_bar, + n_units_ever_in_ring=n_units_ever_in_ring, + n_far_away_obs=int(n_far_away_obs), + is_staggered=is_staggered, + event_study=self.event_study, + stage1_n_obs=stage1_n_obs, + anticipation=self.anticipation, + ) + self.results_ = result + self.is_fitted_ = True + return result diff --git a/docs/api/_autosummary/diff_diff.SpilloverDiD.rst b/docs/api/_autosummary/diff_diff.SpilloverDiD.rst new file mode 100644 index 00000000..53e9ae1d --- /dev/null +++ b/docs/api/_autosummary/diff_diff.SpilloverDiD.rst @@ -0,0 +1,21 @@ +diff\_diff.SpilloverDiD +======================= + +.. currentmodule:: diff_diff + +.. autoclass:: SpilloverDiD + :no-members: + + + .. rubric:: Methods + + .. autosummary:: + + ~SpilloverDiD.__init__ + ~SpilloverDiD.fit + ~SpilloverDiD.get_params + ~SpilloverDiD.set_params + + + + diff --git a/docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst b/docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst new file mode 100644 index 00000000..905cfdc5 --- /dev/null +++ b/docs/api/_autosummary/diff_diff.SpilloverDiDResults.rst @@ -0,0 +1,61 @@ +diff\_diff.SpilloverDiDResults +============================== + +.. currentmodule:: diff_diff + +.. autoclass:: SpilloverDiDResults + :no-members: + + + .. rubric:: Methods + + .. autosummary:: + + ~SpilloverDiDResults.__init__ + ~SpilloverDiDResults.print_summary + ~SpilloverDiDResults.summary + ~SpilloverDiDResults.to_dataframe + ~SpilloverDiDResults.to_dict + + + + + .. rubric:: Attributes + + .. autosummary:: + + ~SpilloverDiDResults.alpha + ~SpilloverDiDResults.anticipation + ~SpilloverDiDResults.bootstrap_distribution + ~SpilloverDiDResults.cluster_name + ~SpilloverDiDResults.coef_var + ~SpilloverDiDResults.coefficients + ~SpilloverDiDResults.conley_lag_cutoff + ~SpilloverDiDResults.d_bar + ~SpilloverDiDResults.event_study + ~SpilloverDiDResults.fitted_values + ~SpilloverDiDResults.inference_method + ~SpilloverDiDResults.is_significant + ~SpilloverDiDResults.is_staggered + ~SpilloverDiDResults.n_bootstrap + ~SpilloverDiDResults.n_clusters + ~SpilloverDiDResults.n_far_away_obs + ~SpilloverDiDResults.n_units_ever_in_ring + ~SpilloverDiDResults.r_squared + ~SpilloverDiDResults.residuals + ~SpilloverDiDResults.ring_breakpoints + ~SpilloverDiDResults.significance_stars + ~SpilloverDiDResults.spillover_effects + ~SpilloverDiDResults.stage1_n_obs + ~SpilloverDiDResults.survey_metadata + ~SpilloverDiDResults.vcov + ~SpilloverDiDResults.vcov_type + ~SpilloverDiDResults.att + ~SpilloverDiDResults.se + ~SpilloverDiDResults.t_stat + ~SpilloverDiDResults.p_value + ~SpilloverDiDResults.conf_int + ~SpilloverDiDResults.n_obs + ~SpilloverDiDResults.n_treated + ~SpilloverDiDResults.n_control + diff --git a/docs/api/index.rst b/docs/api/index.rst index bdb1f086..7ff6b19d 100644 --- a/docs/api/index.rst +++ b/docs/api/index.rst @@ -27,6 +27,7 @@ Core estimator classes for DiD analysis: diff_diff.HeterogeneousAdoptionDiD diff_diff.EfficientDiD diff_diff.TwoStageDiD + diff_diff.SpilloverDiD diff_diff.WooldridgeDiD diff_diff.BaconDecomposition diff_diff.StaggeredTripleDifference @@ -64,6 +65,7 @@ Result containers returned by estimators: diff_diff.EDiDBootstrapResults diff_diff.TwoStageDiDResults diff_diff.TwoStageBootstrapResults + diff_diff.SpilloverDiDResults diff_diff.BaconDecompositionResults diff_diff.wooldridge_results.WooldridgeDiDResults diff_diff.Comparison2x2 @@ -311,6 +313,7 @@ Estimators had efficient_did two_stage + spillover wooldridge_etwfe bacon diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst new file mode 100644 index 00000000..bb50f7d6 --- /dev/null +++ b/docs/api/spillover.rst @@ -0,0 +1,234 @@ +Spillover-Aware DiD (Butts 2021) +================================= + +Ring-indicator spillover-aware Difference-in-Differences estimator. + +This module implements the methodology from Butts, K. (2023, originally 2021), +"Difference-in-Differences with Spatial Spillovers" (arXiv:2105.03737v3). +The estimator augments two-stage Gardner (2022) DiD with ring-indicator +covariates that identify, alongside the direct effect on treated units +(``tau_total``), per-ring spillover effects on near-control units +(``delta_j``). + +The standard DiD estimator is biased for ``tau_total`` by +``-tau_spill(0)`` (paper Proposition 2.1 / Equations 1-2); the +ring-augmented regression removes this bias. The "far-away" cutoff +``d_bar`` defines the boundary beyond which spillovers vanish +(Assumption 5). + +**When to use SpilloverDiD:** + +- Spatial-policy DiD settings where treatment may spill over onto nearby + control units (border-RD, place-based policy, neighborhood effects, + geographic policy boundaries) +- When the canonical DiD coefficient is suspected to be attenuated or + inflated by spillover bias and a far-away control group exists +- As a robustness check on TwoStageDiD / ImputationDiD when the + treatment has plausibly local geographic externalities + +**Reference:** Butts, K. (2023). Difference-in-Differences with Spatial +Spillovers. *arXiv:2105.03737v3*. Gardner, J. (2022). Two-stage +differences in differences. *arXiv:2207.05943*. + +.. module:: diff_diff.spillover + +SpilloverDiD +------------ + +Main estimator class. + +.. autoclass:: diff_diff.SpilloverDiD + :no-index: + :members: + :undoc-members: + :show-inheritance: + :inherited-members: + + .. rubric:: Methods + + .. autosummary:: + + ~SpilloverDiD.fit + ~SpilloverDiD.get_params + ~SpilloverDiD.set_params + +SpilloverDiDResults +------------------- + +Results container with per-ring spillover-effect table. + +.. autoclass:: diff_diff.SpilloverDiDResults + :no-index: + :members: + :undoc-members: + :show-inheritance: + + .. rubric:: Methods + + .. autosummary:: + + ~SpilloverDiDResults.summary + ~SpilloverDiDResults.to_dict + +Example Usage +------------- + +Non-staggered 2-period panel with synthetic spillovers:: + + from diff_diff import SpilloverDiD + + est = SpilloverDiD( + rings=[0, 50, 100, 200], # 3 rings: [0,50), [50,100), [100,200] + conley_coords=("lat", "lon"), + vcov_type="conley", + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + ) + result = est.fit( + data=df, + outcome="y", + unit="unit", + time="time", + treatment="D", # binary; auto-converts to first_treat + ) + print(result.summary()) + # Total effect on treated (Butts tau_total): + print(f"tau_total = {result.att:.4f}") + # Per-ring spillover-on-control: + print(result.spillover_effects) + +Staggered timing (Gardner-convention first_treat column):: + + result = est.fit( + data=df, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", # 0 / inf = never-treated + ) + +Identification spec +------------------- + +The stage-2 regressor for ring j is the time-varying form + +.. math:: + + (1 - D_{it}) \cdot \mathrm{Ring}_{it,j} + +where ``Ring_{it,j}`` is the indicator that unit *i*'s nearest currently- +treated unit is in distance bin *j* at time *t*, and ``D_it`` is the +treatment indicator (1 if unit *i* is treated by time *t*). For +non-staggered timing, all treated units share one onset and ``Ring`` is +unit-static post-treatment / zero pre-treatment. For staggered timing, +``Ring_{it,j}`` is unit-time-varying (a unit's nearest treated neighbor +changes as more cohorts enter treatment). + +Stage 1 fits unit + time fixed effects on Butts' subsample +``Omega_0 = {D_it = 0 AND S_it = 0}`` (untreated AND unexposed) — the +clean far-away control group. Reading the literal unit-static +``(1 - D_it) * S_i`` from paper Equation 5 yields a rank-deficient +design under TWFE; the diff-diff implementation uses the time-varying +form (paper page 12's ``S_it`` notation, made explicit in Section 5 +Table 2). + +Estimator Comparison +-------------------- + +.. list-table:: SpilloverDiD vs. TwoStageDiD vs. TwoWayFixedEffects + :header-rows: 1 + :widths: 25 25 25 25 + + * - Feature + - SpilloverDiD + - TwoStageDiD + - TwoWayFixedEffects + * - Spillover handling + - Identifies per-ring delta_j + - None (assumes SUTVA) + - None (assumes SUTVA) + * - Methodology + - Two-stage Gardner + ring covariates + - Two-stage Gardner + - Single-stage TWFE + * - Staggered timing + - Yes + - Yes + - Biased under heterogeneity + * - Stage-1 subsample + - Butts: ``D=0 AND S=0`` (untreated AND unexposed) + - ``D=0`` (untreated) + - N/A (single stage) + * - Conley spatial-HAC SE + - Yes (via solve_ols at stage 2) + - Not yet supported + - Yes + * - Cluster-robust SE + - Yes (HC1 + CR1 via solve_ols) + - Yes (GMM sandwich + clusters) + - Yes + +Wave B MVP limitations +---------------------- + +The current implementation has the following documented limitations, +planned as follow-up enhancements: + +- **Gardner GMM first-stage correction at stage 2** — stage-2 variance + is the standard ``solve_ols`` HC1 / Conley / cluster estimator without + the influence-function adjustment for stage-1 FE estimation + uncertainty. The full GMM sandwich (Butts & Gardner 2022) is planned + as a follow-up; until then, reported SEs are biased downward + (underestimated by a few percent in typical settings) because they + omit the additional variance contribution from estimating the stage-1 + fixed effects. Confidence intervals are correspondingly too narrow + and p-values too small. Users should treat reported significance + conservatively until the GMM correction lands. +- **Event-study mode** — ``event_study=True`` raises + ``NotImplementedError``. The per-event-time × ring decomposition + (Butts Table 2 staggered specification) is queued for a follow-up PR. +- **Survey-design integration** — ``survey_design=`` raises + ``NotImplementedError``. +- **Count-of-treated-in-ring** — only the "nearest-treated ring" + specification is implemented. The "count" form re-introduces + functional-form dependence (paper Section 3.2 end) and is queued. +- **Data-driven d_bar selection** — Butts (2021b) / Butts (2023) JUE + Insight propose cross-validation under stronger parallel-trends + assumptions. Not in this PR. +- **HC2 / HC2_BM (Bell-McCaffrey / CR2) variance** — current stage-2 + inference uses a generic residual-df (n - effective rank) for + t-distribution lookups. ``vcov_type="hc2"`` / ``"hc2_bm"`` require + per-coefficient BM / CR2 DOF and raise ``NotImplementedError``. + Routing stage 2 through ``LinearRegression`` (which supplies the + per-coefficient DOF metadata) is queued. +- **Balanced panel required** — every unit must observe every period. + An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected + FE components and unidentified stage-1 residuals on treated rows. + Validator rejects unbalanced inputs. +- **Omega_0 row-level identification (period strict, unit warn-drop, plus connectivity)** — + Every period must have at least one Omega_0 row (else time FE for + that period is structurally unidentified; hard ``ValueError``). Units + lacking Omega_0 rows (e.g. baseline-treated units with ``D_it = 1`` at + every observed ``t``) are warned-and-dropped: their unit FE is NaN + and the downstream finite-mask path excludes them from stage 2. + Additionally, the supported-units bipartite graph (units linked by + shared Omega_0 periods) must form a single connected component; + ``K > 1`` components raise ``ValueError`` because the FE solver would + return only component-specific constants and residualization would + silently mix them across components. Mirrors ``TwoStageDiD``'s + always-treated unit handling on the warn-drop axis. +- **One row per (unit, time) cell required** — duplicate cells silently + re-weight both stage-1 FE estimation and stage-2 OLS. Validator + rejects duplicate cells; aggregate to unique cells before fitting. +- **rings[0] must equal 0** — the partition must cover treated + locations (``d_it = 0`` belongs to Ring 1). Validator rejects rings + that start at a nonzero inner edge to prevent a silent + exposed-but-unmodeled population in ``0 <= d_it < rings[0]``. +- **conley_coords must be constant within unit** — ring construction + uses a single (lat, lon) per unit (units are stationary point + locations). Validator rejects coordinates that vary across rows of + the same unit; aggregate to one location per unit (e.g. + ``df.groupby(unit)[["lat","lon"]].first()``) before fitting if your + raw data has moving coordinates. This requirement applies on every + fit path, not just ``vcov_type="conley"``, because the rings + themselves are always coordinate-derived. diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index 20770069..09d8aa6d 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -268,6 +268,28 @@ sources: - path: docs/choosing_estimator.rst type: user_guide + # ── SpilloverDiD (spillover group) ────────────────────────────────── + + diff_diff/spillover.py: + drift_risk: high + docs: + - path: docs/methodology/REGISTRY.md + section: "SpilloverDiD" + type: methodology + - path: docs/api/spillover.rst + type: api_reference + - path: README.md + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst + type: user_guide + - path: diff_diff/guides/llms-full.txt + section: "SpilloverDiD" + type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide + # ── EfficientDiD (efficient_did group) ────────────────────────────── diff_diff/efficient_did.py: diff --git a/docs/index.rst b/docs/index.rst index 86755820..5c500530 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -177,6 +177,8 @@ Supported Estimators - Borusyak, Jaravel & Spiess (2024) imputation estimator * - :class:`~diff_diff.TwoStageDiD` - Gardner (2022) two-stage residualized estimator + * - :class:`~diff_diff.SpilloverDiD` + - Butts (2021) ring-indicator spillover-aware DiD * - :class:`~diff_diff.SyntheticDiD` - Synthetic DiD combining DiD and synthetic control * - :class:`~diff_diff.StackedDiD` diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 099b8059..13995e77 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -25,6 +25,7 @@ This document provides the academic foundations and key implementation requireme - [StaggeredTripleDifference](#staggeredtripledifference) - [TROP](#trop) - [HeterogeneousAdoptionDiD](#heterogeneousadoptiondid) + - [SpilloverDiD](#spilloverdid) 4. [Diagnostics & Sensitivity](#diagnostics--sensitivity) - [PlaceboTests](#placebotests) - [BaconDecomposition](#bacondecomposition) @@ -2936,6 +2937,84 @@ should be a deliberate user choice. --- +## SpilloverDiD + +**Primary source:** Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. *arXiv:2105.03737v3* (originally posted 2021). https://arxiv.org/abs/2105.03737 + +**Secondary sources:** +- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*. The stage-1+2 residualization pattern this estimator reuses. +- Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. *Journal of Econometrics* 92(1), 1-45. The spatial-HAC variance estimator recommended by Butts page 13 for inference with cutoff = `d_bar`. + +**Scope:** Spillover-aware Difference-in-Differences. Augments two-stage Gardner DiD with ring-indicator covariates that identify the direct effect on treated units (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Handles both panel non-staggered timing and Section 5 staggered timing in a single estimator; non-staggered is the special case where all treated units share an onset time. + +**Note:** This estimator is a **documented synthesis** of ingredients — no single published software package implements the exact recipe. `did2s` (R/Stata, Butts & Gardner 2022) implements the Gardner two-stage residualization but does NOT support ring covariates. The Butts (2021/2023) paper proposes the ring estimator in Equations 5/6/8 (non-staggered) and Section 5 / Table 2 (staggered) but does not ship reference software. The diff-diff implementation combines: (a) Butts (2021) Section 5 / Table 2 identification, (b) Gardner (2022) two-stage residualize-then-fit, (c) Wave A's Conley spatial-HAC vcov. + +**Identification spec (committed):** + +The stage-2 regressor for ring `j` is the **time-varying** form + +``` +(1 - D_it) * Ring_{it,j} +``` + +where `D_it` is the treatment indicator (1 if unit `i` is treated by time `t`) and `Ring_{it,j}` is the indicator that unit `i`'s nearest currently-treated unit (treated by time `t`) is in distance bin `j`. For non-staggered timing, `Ring_{it,j}` is unit-static post-treatment and zero pre-treatment (no unit treated yet); for staggered timing, `Ring_{it,j}` is unit-time-varying. + +**Note:** Reading the literal unit-static `(1 - D_it) * S_i` from paper Equation 5 yields a **rank-deficient design** under TWFE: `(1 - D_it) * S_i = S_i - D_it * S_i = S_i - D_it` (since `D_it = 1 ⇒ S_i = 1`), and `S_i` is absorbed by the unit FE `mu_i`, leaving `-D_it`. The two regressors are perfectly anti-collinear after FE absorption. The paper's identification argument (Proposition 2.3, Section 3.1 subsample language) only resolves when `S_it = S_i * 1{t >= first_treat}` is read as the **time-varying** form — which the paper defines on page 12 (the line directly above Equation 5) and which Section 5 Table 2 makes explicit (`S^k_{it}`, `Ring^k_{it,j}`). The implementation matches the paper's intent once the `S_it` notation is interpreted as time-varying. + +**Note:** Stage-1 fits unit + time FE on Butts' STRICTER subsample `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed) — the clean far-away control group. This differs from `TwoStageDiD`'s `Omega_0 = {D_it = 0}` (untreated; includes near-controls in post-treatment periods). The stricter Butts subsample prevents spillover-contaminated near-controls from biasing the time FE; near-controls post-treatment carry `delta_j` variation that the ring covariates pick up at stage 2. + +**Note (Omega_0 row-level identification — period strict, unit warn-and-drop, plus connectivity):** Every period must have at least one Omega_0 row (else time FE is structurally unidentified for that period, and dropping it would lose all units' cross-time identification) — hard `ValueError`. Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path at stage 2 excludes them from estimation. This mirrors `TwoStageDiD`'s always-treated unit handling (`two_stage.py:294-336`) and Gardner's framework, which identifies effects from supported observations rather than requiring every unit estimable. **Connectivity:** the supported-units bipartite graph (supported units linked by shared Omega_0 periods) must form a single connected component. If the graph splits into K > 1 components, the iterative FE solver identifies (`mu_i`, `lambda_t`) only up to component-specific constants, and residualization combines `mu_i` from one component with `lambda_t` from another — silently corrupting `y_tilde` and downstream `tau_total` / `delta_j`. Balanced panel + per-unit/per-period Omega_0 coverage is NECESSARY but NOT SUFFICIENT; connectivity is the load-bearing identification condition. Under the current absorbing-treatment regime the disconnected case is plausibly unreachable in practice (we were unable to construct an example surviving the upstream validators), but `_check_omega_0_connectivity` is in place as defense-in-depth and future-proofs Wave B follow-ups (event-study, survey-design integration, reversible-treatment relaxation if ever added). + +**Note (Gardner identity, non-staggered):** Under non-staggered timing, the two-stage Gardner residualize-then-fit with the Omega_0-restricted stage 1 is **empirically bit-identical** to the single-stage TWFE ring regression on the full sample using the time-varying form `y_it ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`. This is the non-staggered ring estimator from Butts Equations 4-6. The empirical equivalence is verified by a 20-seed deterministic regression test (`TestSpilloverDiDNonStaggeredFEEquivalence`) at `atol=1e-10`. The Omega_0 restriction is therefore innocent for the non-staggered point estimate — it only changes the variance composition (which is why the stage-1 GMM correction enters at stage 2 in the staggered case). Reported `tau_total` for non-staggered timing IS the Butts Eqs. 4-6 estimator. + +**Note:** Treated units have `d_it = 0` (their own location) and fall in Ring_1 geometrically, but the `(1 - D_it)` factor zeros their ring contribution. `n_units_ever_in_ring` counts include treated units in Ring_1 by construction. Under staggered timing this field counts each unit in EVERY ring it visits across periods (not a clean partition); under non-staggered timing it IS a partition. + +**Note (anticipation shift):** The `anticipation: int` constructor kwarg (default 0) shifts BOTH the treatment and ring-membership clocks by `-anticipation` so the stage-1 Omega_0 subsample correctly excludes the `anticipation` pre-treatment periods (which would otherwise contaminate the FE estimation with anticipation effects). Concretely, the effective treatment indicator becomes `D'_it = 1{t >= first_treat - anticipation}` and ring membership uses the corresponding shifted "currently-treated" set when computing `d_it` for staggered timing. Stage 2 then estimates `tau_total` and `delta_j` net of the anticipation window. Mirrors `TwoStageDiD`'s `anticipation` parameter semantics — recommended use is a small integer (1-3 periods) when domain knowledge suggests treatment effects begin before formal onset (e.g., policy announcements). Setting `anticipation > 0` AND providing a `first_treat` column where `first_treat - anticipation < min(time)` for any unit will mark that unit as treated from the panel's first observation, with the same baseline-treated handling as if it were always-treated (warn-and-drop via the Omega_0 unit-level check above). + +**Note:** No published R/Stata software exists for the two-period ring estimator (Butts Eqs. 5-6). Correctness anchored on (a) synthetic-DGP Monte Carlo identification tests on the **non-staggered** DGP (50-seed default + 200-seed `@pytest.mark.slow` variant — both recover `tau_total` AND `delta_1` within 0.02 absolute tolerance on Butts-Assumption-satisfying DGPs) plus a **staggered**-DGP MC test (30-seed, no slow variant) that anchors `tau_total` only within 0.04 absolute tolerance (looser bound because each staggered DGP is larger and noisier); per-ring `delta_jk` recovery on staggered DGPs and a 200-seed slow staggered variant are queued as follow-ups alongside `event_study=True` support; (b) reduce-to-TWFE limit (non-staggered, 20-seed deterministic bit-identity at `atol=1e-10` via `TestSpilloverDiDNonStaggeredFEEquivalence`); (c) Wave A's Conley sparse-vs-dense parity inherited via `solve_ols`. A reduce-to-`TwoStageDiD` limit was scoped during planning but not shipped in Wave B (the Omega_0 stricter subsample requires additional fixture work to align with `TwoStageDiD`'s `{D_it = 0}` subsample); queued as a follow-up alongside the Gardner GMM correction. + +**Assumptions (Butts 2021):** + +- **Assumption 1 (Random Sampling)** — i.i.d. panel. +- **Assumption 3 (Parallel Counterfactual Trends)** — counterfactual trends in absence of all treatment AND zero exposure do not depend on own treatment status. +- **Assumption 5 (Spillovers Are Local):** (i) `min_{j: D_j=1} d(i,j) > d_bar ⇒ h_i(D-vector) = 0` (spillovers vanish past `d_bar`); (ii) at least one treated unit AND one control unit exist with `min d > d_bar` (clean far-away control group). The validator enforces (ii) via `_validate_far_away_exists` with a strict `> d_bar` check; failure raises `ValueError` citing the assumption. +- **Assumption 6 (Total Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)`. Stronger than Assumption 3. +- **Assumption 7 (Spillover Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)` for `S_i ∈ {0, 1}`. Required to identify `gamma_0` / `delta_j`. +- **Assumption 8 (Parallel Counterfactual Trends, Staggered)** — additive unit + time FE structure on untreated/unexposed potential outcomes. Stronger than Assumption 3. + +**Variance (Wave B MVP — documented limitation):** + +The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / cluster paths, all dispatched via `vcov_type`). The **Gardner GMM sandwich first-stage uncertainty correction is NOT applied** at stage 2 in this PR. The full GMM + Conley composition is queued as a follow-up enhancement that extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step (see Wave B plan Risks #2 for the IF formula). The reported SE is therefore **biased downward** (underestimated by a few percent in typical settings) because it omits the additional variance contribution from estimating the stage-1 FE; confidence intervals are correspondingly too narrow and p-values too small. Treat reported significance conservatively until the GMM correction lands. + +**Edge cases (from paper Section 3.2 / Discussion):** + +| # | Edge case | Handling | +|---|---|---| +| 1 | No nearby control units (Assumption 5(ii) fails) | Hard error via `_validate_far_away_exists` | +| 2 | `d_bar` too small (some affected units classified as far) | User responsibility; sensitivity analysis across `d_bar` values recommended (vary `rings[-1]`) | +| 3 | `d_bar` too large (`S=0` group fewer characteristics) | Same as #2; bias-variance trade-off (paper page 13-14) | +| 4 | Single-ring `S_i` attenuation (many unaffected units) | Use multiple rings (Equation 6 multi-ring spec); supported by passing more breakpoints | +| 5 | Spillovers extend past largest ring | User inspects outermost `delta_K` significance; if non-zero, extend the outermost ring | +| 6 | Additive spillovers in count of nearby treated | `ring_method="count"` deferred (paper notes count form re-introduces functional-form dependence) | +| 7 | Staggered + negative Goodman-Bacon weights | Two-stage Gardner methodology avoids this (paper page 22) | + +**Restrictions / deferred features:** + +- `event_study=True` raises `NotImplementedError` — per-event-time × ring decomposition (Butts Table 2) queued for follow-up. +- `survey_design=` raises `NotImplementedError` — planned follow-up. +- `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up. +- `ring_method="count"` not exposed — only the nearest-treated-ring specification. +- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`} raises `NotImplementedError` — current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF. Use `"hc1"`, `"classical"`, `"conley"`, or pair with `cluster=` for CR1. +- **`rings[0]` must equal 0** — the partition must cover treated locations (`d_it = 0` belongs to Ring 1). Rings starting at a nonzero inner edge would leave units in `0 <= d_it < rings[0]` as exposed-but-unmodeled, silently biasing the estimator. Validator rejects such inputs. +- **Balanced panel required (Wave B MVP)** — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. Exact graph-connectivity-based identification (which would relax this to a strictly weaker condition) is queued as a follow-up extension. Validator rejects unbalanced inputs. +- **One row per `(unit, time)` cell required** — duplicate cells silently re-weight stage-1 FE estimation AND stage-2 OLS. Validator rejects duplicate cells. +- Data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight) not exposed. +- Gardner GMM first-stage correction at stage 2 not applied (HC1/Conley/cluster only; documented limitation). + +**Implementation:** `diff_diff/spillover.py`. Public class `SpilloverDiD`; result class `SpilloverDiDResults(DiDResults)` at `diff_diff/results.py`. Tests at `tests/test_spillover.py`; DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` (satisfy Butts Assumptions 1/3/5/7 by construction). + +--- + ## ConleySpatialHAC **Primary source:** Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. *Journal of Econometrics* 92(1), 1-45. DOI: 10.1016/S0304-4076(98)00084-0 diff --git a/docs/references.rst b/docs/references.rst index 5dc3d603..a915a3ae 100644 --- a/docs/references.rst +++ b/docs/references.rst @@ -182,6 +182,10 @@ Multi-Period and Staggered Adoption - **Butts, K., & Gardner, J. (2022).** "did2s: Two-Stage Difference-in-Differences." *The R Journal*, 14(1), 162-173. https://doi.org/10.32614/RJ-2022-048 +- **Butts, K. (2023).** "Difference-in-Differences with Spatial Spillovers." *arXiv:2105.03737v3* (originally posted 2021). https://arxiv.org/abs/2105.03737 + + Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``. + - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169 - **de Chaisemartin, C., & D'Haultfœuille, X. (2022, revised 2024).** "Difference-in-Differences Estimators of Intertemporal Treatment Effects." *NBER Working Paper* 29873. https://www.nber.org/papers/w29873 diff --git a/tests/_dgp_utils.py b/tests/_dgp_utils.py new file mode 100644 index 00000000..1f568748 --- /dev/null +++ b/tests/_dgp_utils.py @@ -0,0 +1,250 @@ +"""Synthetic data-generating processes for spillover-DiD tests. + +These factories produce panels satisfying Butts (2021) Assumptions 1, 3, 5, +and 7 by construction. Used by ``tests/test_spillover.py`` to anchor +identification claims (Wave B MVP correctness anchors). +""" + +from typing import Optional + +import numpy as np +import pandas as pd + + +def generate_butts_nonstaggered_dgp( + *, + n_units: int = 200, + n_treated: int = 20, + n_near_control: int = 80, + n_far_control: int = 100, + n_periods: int = 2, + t_treat: int = 1, + tau_total: float = -0.07, + delta_1: float = -0.04, + d_bar: float = 100.0, + near_control_d_max: Optional[float] = None, + far_control_d_min: Optional[float] = None, + parallel_trends_shock_sd: float = 0.0, + error_sd: float = 0.05, + seed: int = 42, +) -> pd.DataFrame: + """Non-staggered panel with known direct + spillover effects. + + All treated units share onset at ``t_treat``. Layout: + + - Treated units cluster near (lat=0, lon=0). + - Near-controls placed within ``near_control_d_max`` (default + ``d_bar / 2`` km) so they're inside the spillover zone. + - Far-controls placed beyond ``far_control_d_min`` (default + ``2 * d_bar`` km). + + Potential-outcomes model (satisfies Assumption 6/7 by construction): + + .. code:: + + Y_it(0, 0) = mu_i + lambda_t + e_it + Y_i1(1, 0) - Y_i0(0, 0) = (lambda_1 - lambda_0) + tau_total + for treated units + Y_i1(0, S_i=1) - Y_i0(0, S_i=1) = (lambda_1 - lambda_0) + delta_1 + for near-control units (D_i=0, S_i=1) + Y_i1(0, S_i=0) - Y_i0(0, S_i=0) = (lambda_1 - lambda_0) + for far-control units (clean) + + Returns + ------- + DataFrame with columns: ``unit``, ``time``, ``D`` (binary at row level), + ``first_treat`` (per-unit onset; ``np.inf`` for never-treated), + ``y`` (outcome), ``lat``, ``lon``. + """ + if n_units != n_treated + n_near_control + n_far_control: + n_units = n_treated + n_near_control + n_far_control + + if near_control_d_max is None: + near_control_d_max = d_bar / 2.0 + if far_control_d_min is None: + far_control_d_min = 2.0 * d_bar + + rng = np.random.default_rng(seed) + + # ~111 km per latitude degree on the equator (haversine approx). + KM_PER_DEG = 111.195 + near_d_max_deg = near_control_d_max / KM_PER_DEG + far_d_min_deg = far_control_d_min / KM_PER_DEG + + units = [f"u{i:04d}" for i in range(n_units)] + mu = rng.normal(0.0, 0.5, size=n_units) + # Time effects: linear trend + lambda_t = np.array([0.05 * t for t in range(n_periods)]) + if parallel_trends_shock_sd > 0: + lambda_t = lambda_t + rng.normal(0, parallel_trends_shock_sd, size=n_periods) + + coords = np.zeros((n_units, 2)) + is_treated = np.zeros(n_units, dtype=bool) + is_near = np.zeros(n_units, dtype=bool) + + # Treated: cluster very near origin (within 1 km) + for i in range(n_treated): + coords[i, 0] = rng.normal(0.0, 0.005) + coords[i, 1] = rng.normal(0.0, 0.005) + is_treated[i] = True + + # Near-control: within near_d_max_deg of origin, but not at origin + near_start = n_treated + for i in range(n_near_control): + idx = near_start + i + # Uniformly distributed in an annulus to avoid clustering + r_deg = rng.uniform(near_d_max_deg * 0.1, near_d_max_deg * 0.95) + theta = rng.uniform(0, 2 * np.pi) + coords[idx, 0] = r_deg * np.cos(theta) + coords[idx, 1] = r_deg * np.sin(theta) + is_near[idx] = True + + # Far-control: at far_d_min_deg+ from origin + far_start = n_treated + n_near_control + for i in range(n_far_control): + idx = far_start + i + r_deg = rng.uniform(far_d_min_deg, far_d_min_deg * 1.5) + theta = rng.uniform(0, 2 * np.pi) + coords[idx, 0] = r_deg * np.cos(theta) + coords[idx, 1] = r_deg * np.sin(theta) + + rows = [] + for i, u in enumerate(units): + ft = float(t_treat) if is_treated[i] else np.inf + for t in range(n_periods): + y_clean = mu[i] + lambda_t[t] + post = t >= t_treat + if is_treated[i] and post: + y = y_clean + tau_total + elif is_near[i] and post: + y = y_clean + delta_1 + else: + y = y_clean + y += rng.normal(0, error_sd) + rows.append( + { + "unit": u, + "time": t, + "lat": coords[i, 0], + "lon": coords[i, 1], + "D": int(is_treated[i] and post), + "first_treat": ft, + "y": y, + } + ) + return pd.DataFrame(rows) + + +def generate_butts_staggered_dgp( + *, + n_cohorts: int = 3, + units_per_cohort: int = 30, + n_far_control: int = 90, + n_periods: int = 6, + cohort_onsets: Optional[list] = None, + tau_total: float = -0.07, + delta_1: float = -0.04, + d_bar: float = 100.0, + error_sd: float = 0.05, + seed: int = 42, +) -> pd.DataFrame: + """Staggered panel with known per-cohort effects. + + Three cohorts treat at distinct times; far-control group never treated. + Each cohort is clustered at a distinct geographic center so spillover + is well-defined per cohort (units near cohort C1 receive spillover + starting at C1's onset, etc.). + + Returns the same column schema as :func:`generate_butts_nonstaggered_dgp`. + """ + if cohort_onsets is None: + cohort_onsets = [1, 2, 3][:n_cohorts] + + rng = np.random.default_rng(seed) + KM_PER_DEG = 111.195 + + rows = [] + coords_by_unit = {} + first_treat_by_unit = {} + is_treated_unit = {} + # Spillover-trigger onset per near-control unit (the cohort it belongs to). + spillover_onset_by_unit: dict = {} + + unit_idx = 0 + + # Cohort centers spaced ~5*d_bar apart so cohorts don't overlap. + cohort_spacing_deg = (5 * d_bar) / KM_PER_DEG + + for c_idx in range(n_cohorts): + center_lat = c_idx * cohort_spacing_deg + center_lon = 0.0 + onset = cohort_onsets[c_idx] + for i in range(units_per_cohort): + u = f"c{c_idx}_{unit_idx:04d}" + unit_idx += 1 + # 1/3 of cohort members are "treated" (cluster at center); + # 2/3 are "near-controls" within d_bar/2 of center. + if i < units_per_cohort // 3: + coords_by_unit[u] = ( + center_lat + rng.normal(0, 0.003), + center_lon + rng.normal(0, 0.003), + ) + first_treat_by_unit[u] = float(onset) + is_treated_unit[u] = True + else: + r_deg = rng.uniform(0.05, (d_bar / 2.0) / KM_PER_DEG * 0.9) + theta = rng.uniform(0, 2 * np.pi) + coords_by_unit[u] = ( + center_lat + r_deg * np.cos(theta), + center_lon + r_deg * np.sin(theta), + ) + first_treat_by_unit[u] = np.inf + is_treated_unit[u] = False + # Near-control receives spillover when its cohort treats. + spillover_onset_by_unit[u] = float(onset) + + # Far-control: placed far from all cohort centers + far_center_lat = n_cohorts * cohort_spacing_deg + 5 * d_bar / KM_PER_DEG + for i in range(n_far_control): + u = f"far_{i:04d}" + unit_idx += 1 + r_deg = rng.uniform(0.0, 1.0) + theta = rng.uniform(0, 2 * np.pi) + coords_by_unit[u] = ( + far_center_lat + r_deg * np.cos(theta), + r_deg * np.sin(theta), + ) + first_treat_by_unit[u] = np.inf + is_treated_unit[u] = False + # Far-controls receive no spillover (no nearby cohort). + + mu = {u: rng.normal(0, 0.5) for u in coords_by_unit} + lambda_t = np.array([0.05 * t for t in range(n_periods)]) + + for u, (lat, lon) in coords_by_unit.items(): + ft = first_treat_by_unit[u] + is_treat = is_treated_unit[u] + spillover_onset = spillover_onset_by_unit.get(u, np.inf) + for t in range(n_periods): + y_clean = mu[u] + lambda_t[t] + # Direct effect for own treated unit + if is_treat and t >= ft: + y = y_clean + tau_total + # Spillover on near-control once its cohort activates + elif (not is_treat) and t >= spillover_onset: + y = y_clean + delta_1 + else: + y = y_clean + y += rng.normal(0, error_sd) + rows.append( + { + "unit": u, + "time": t, + "lat": lat, + "lon": lon, + "D": int(is_treat and t >= ft), + "first_treat": ft, + "y": y, + } + ) + return pd.DataFrame(rows) diff --git a/tests/test_spillover.py b/tests/test_spillover.py new file mode 100644 index 00000000..c3892942 --- /dev/null +++ b/tests/test_spillover.py @@ -0,0 +1,2610 @@ +"""Tests for SpilloverDiD (Butts 2021 ring-indicator spillover-aware DiD). + +Step 1 surface: ring-construction helpers and the public class scaffold. +Step 2+ surfaces are added incrementally as the implementation lands. +""" + +import numpy as np +import pandas as pd +import pytest + +from diff_diff.spillover import ( + SpilloverDiD, + _apply_callable_metric_pairwise, + _build_ring_indicators, + _check_omega_0_connectivity, + _compute_nearest_treated_distance_sparse, + _compute_nearest_treated_distance_staggered, + _compute_nearest_treated_distance_static, + _convert_treatment_to_first_treat, + _euclidean_pairwise, + _extract_treatment_onsets, + _haversine_km_pairwise, + _pairwise_ring_distances, + _ring_label, +) +from tests._dgp_utils import ( + generate_butts_nonstaggered_dgp, + generate_butts_staggered_dgp, +) + +# ============================================================================= +# Pairwise-distance primitives +# ============================================================================= + + +class TestHaversinePairwise: + """Tests for _haversine_km_pairwise.""" + + def test_zero_distance_when_same_point(self): + coords = np.array([[40.7128, -74.0060]]) # NYC + result = _haversine_km_pairwise(coords, coords) + assert result.shape == (1, 1) + assert abs(result[0, 0]) < 1e-9 + + def test_known_pair_nyc_to_la(self): + # NYC (40.7128 N, 74.0060 W) to LA (34.0522 N, 118.2437 W) + # Reference great-circle distance ~ 3935.7 km (within 0.5% of any source) + nyc = np.array([[40.7128, -74.0060]]) + la = np.array([[34.0522, -118.2437]]) + result = _haversine_km_pairwise(nyc, la) + assert result.shape == (1, 1) + assert abs(result[0, 0] - 3935.7) < 5.0 + + def test_pairwise_matrix_shape(self): + coords_a = np.array([[40.0, -74.0], [34.0, -118.0]]) + coords_b = np.array([[51.5, -0.1], [35.7, 139.7], [40.7, -74.0]]) + result = _haversine_km_pairwise(coords_a, coords_b) + assert result.shape == (2, 3) + # All non-negative + assert (result >= 0).all() + + +class TestEuclideanPairwise: + """Tests for _euclidean_pairwise.""" + + def test_known_3_4_5(self): + a = np.array([[0.0, 0.0]]) + b = np.array([[3.0, 4.0]]) + result = _euclidean_pairwise(a, b) + assert abs(result[0, 0] - 5.0) < 1e-12 + + def test_zero_distance_same_point(self): + coords = np.array([[1.5, 2.5], [3.0, 4.0]]) + result = _euclidean_pairwise(coords, coords) + np.testing.assert_allclose(np.diag(result), 0.0, atol=1e-12) + + +class TestPairwiseRingDistances: + """Tests for the _pairwise_ring_distances dispatch.""" + + def test_haversine_branch(self): + a = np.array([[0.0, 0.0]]) + b = np.array([[0.0, 0.0]]) + result = _pairwise_ring_distances(a, b, "haversine") + assert result.shape == (1, 1) + assert abs(result[0, 0]) < 1e-9 + + def test_euclidean_branch(self): + a = np.array([[0.0, 0.0]]) + b = np.array([[1.0, 0.0]]) + result = _pairwise_ring_distances(a, b, "euclidean") + assert abs(result[0, 0] - 1.0) < 1e-12 + + def test_callable_branch(self): + a = np.array([[0.0, 0.0], [1.0, 1.0]]) + b = np.array([[2.0, 2.0]]) + + def custom(x, y): + return np.full((x.shape[0], y.shape[0]), 7.5) + + result = _pairwise_ring_distances(a, b, custom) + assert result.shape == (2, 1) + np.testing.assert_allclose(result, 7.5) + + def test_unknown_metric_raises(self): + a = np.array([[0.0, 0.0]]) + b = np.array([[1.0, 1.0]]) + with pytest.raises(ValueError, match="Unknown conley_metric"): + _pairwise_ring_distances(a, b, "manhattan") + + +class TestApplyCallableMetricPairwise: + """Validation of user-supplied callable distance metrics.""" + + def test_wrong_shape_raises(self): + a = np.array([[0.0, 0.0], [1.0, 1.0]]) + b = np.array([[2.0, 2.0], [3.0, 3.0]]) + + def bad(x, y): + return np.zeros((1, 1)) + + with pytest.raises(ValueError, match="shape"): + _apply_callable_metric_pairwise(bad, a, b) + + def test_non_finite_raises(self): + a = np.array([[0.0, 0.0]]) + b = np.array([[1.0, 1.0]]) + + def bad(x, y): + return np.array([[np.inf]]) + + with pytest.raises(ValueError, match="non-finite"): + _apply_callable_metric_pairwise(bad, a, b) + + def test_negative_raises(self): + a = np.array([[0.0, 0.0]]) + b = np.array([[1.0, 1.0]]) + + def bad(x, y): + return np.array([[-1.0]]) + + with pytest.raises(ValueError, match="negative"): + _apply_callable_metric_pairwise(bad, a, b) + + +# ============================================================================= +# Static nearest-treated distance +# ============================================================================= + + +@pytest.fixture +def small_static_panel(): + """3 treated units near origin, 3 near-controls 25-100km out, 3 far at 500km.""" + rng = np.random.default_rng(42) + treated = [(0.0 + rng.normal(0, 0.05), 0.0 + rng.normal(0, 0.05)) for _ in range(3)] + near = [(0.4 + i * 0.05, 0.0) for i in range(3)] # ~44-55 km east + far = [(5.0 + i * 0.1, 0.0) for i in range(3)] # ~556+ km east + units = [] + coords = [] + treats = [] + for i, c in enumerate(treated): + units.append(f"T{i}") + coords.append(c) + treats.append(1) + for i, c in enumerate(near): + units.append(f"N{i}") + coords.append(c) + treats.append(0) + for i, c in enumerate(far): + units.append(f"F{i}") + coords.append(c) + treats.append(0) + # Two periods so the static panel has 2 rows per unit + rows = [] + for u, c, d in zip(units, coords, treats): + for t in (0, 1): + rows.append( + { + "unit": u, + "time": t, + "lat": c[0], + "lon": c[1], + "D": d * t, # turns on at t=1 for treated + } + ) + return pd.DataFrame(rows) + + +class TestComputeNearestTreatedDistanceStatic: + """Static (non-staggered) nearest-treated distance helper.""" + + def test_treated_units_have_zero_distance(self, small_static_panel): + treated_ids = np.array(["T0", "T1", "T2"]) + d_i, unit_index = _compute_nearest_treated_distance_static( + small_static_panel, + unit="unit", + coords=("lat", "lon"), + metric="haversine", + treated_unit_ids=treated_ids, + ) + for tid in treated_ids: + pos = np.where(unit_index == tid)[0][0] + # Treated units' nearest treated is themselves OR an adjacent T*; the + # static fixture clusters them within rng.normal(0, 0.05) deg ~6 km. + assert d_i[pos] < 15.0 # all treated cluster within ~15 km + + def test_near_controls_below_far_controls(self, small_static_panel): + treated_ids = np.array(["T0", "T1", "T2"]) + d_i, unit_index = _compute_nearest_treated_distance_static( + small_static_panel, + unit="unit", + coords=("lat", "lon"), + metric="haversine", + treated_unit_ids=treated_ids, + ) + near_pos = [np.where(unit_index == f"N{i}")[0][0] for i in range(3)] + far_pos = [np.where(unit_index == f"F{i}")[0][0] for i in range(3)] + assert all(d_i[p] < 100.0 for p in near_pos) + assert all(d_i[p] > 500.0 for p in far_pos) + + def test_euclidean_metric(self, small_static_panel): + treated_ids = np.array(["T0", "T1", "T2"]) + d_i_h, _ = _compute_nearest_treated_distance_static( + small_static_panel, + unit="unit", + coords=("lat", "lon"), + metric="haversine", + treated_unit_ids=treated_ids, + ) + d_i_e, _ = _compute_nearest_treated_distance_static( + small_static_panel, + unit="unit", + coords=("lat", "lon"), + metric="euclidean", + treated_unit_ids=treated_ids, + ) + # Different units, but ordering should be consistent (near < far) + # so the rank of distances matches between metrics. + order_h = np.argsort(d_i_h) + order_e = np.argsort(d_i_e) + np.testing.assert_array_equal(order_h, order_e) + + def test_no_treated_units_raises(self, small_static_panel): + with pytest.raises(ValueError, match="no treated units present"): + _compute_nearest_treated_distance_static( + small_static_panel, + unit="unit", + coords=("lat", "lon"), + metric="haversine", + treated_unit_ids=np.array(["nonexistent_unit"]), + ) + + def test_unit_index_is_sorted(self, small_static_panel): + treated_ids = np.array(["T0", "T1", "T2"]) + _, unit_index = _compute_nearest_treated_distance_static( + small_static_panel, + unit="unit", + coords=("lat", "lon"), + metric="haversine", + treated_unit_ids=treated_ids, + ) + # Sorted lexicographically: F0, F1, F2, N0, N1, N2, T0, T1, T2 + expected = ["F0", "F1", "F2", "N0", "N1", "N2", "T0", "T1", "T2"] + np.testing.assert_array_equal(unit_index, expected) + + +class TestComputeNearestTreatedDistanceSparse: + """Sparse cKDTree path for nearest-treated computation.""" + + def test_sparse_matches_dense_haversine(self, small_static_panel): + # Force sparse path on the small fixture by using a tight cutoff. + treated_ids = np.array(["T0", "T1", "T2"]) + unit_coords_df = ( + small_static_panel[["unit", "lat", "lon"]] + .drop_duplicates(subset="unit") + .set_index("unit") + .sort_index() + ) + all_coords = unit_coords_df[["lat", "lon"]].values.astype(np.float64) + treated_mask = np.array( + [uid in set(treated_ids.tolist()) for uid in unit_coords_df.index], + dtype=bool, + ) + treated_coords = all_coords[treated_mask] + # Sparse path with a 1000 km cutoff should agree with the dense path on + # all in-range units; far controls (>500 km but <1000 km from any + # treated) get their true nearest-treated distance. + d_sparse = _compute_nearest_treated_distance_sparse( + all_coords=all_coords, + treated_coords=treated_coords, + metric="haversine", + cutoff_km=1000.0, + ) + d_dense = _haversine_km_pairwise(all_coords, treated_coords).min(axis=1) + # Mask: only compare entries within cutoff in dense (sparse returns inf otherwise). + in_range = d_dense <= 1000.0 * (1 + 1e-6) + np.testing.assert_allclose(d_sparse[in_range], d_dense[in_range], atol=1e-8) + + def test_sparse_inf_when_no_treated_in_range(self): + # Single unit at (50, 0); treated cluster at (0, 0). With cutoff 100km, + # great-circle ~5500 km exceeds it; expect inf. + all_coords = np.array([[50.0, 0.0]]) + treated_coords = np.array([[0.0, 0.0]]) + d = _compute_nearest_treated_distance_sparse( + all_coords=all_coords, + treated_coords=treated_coords, + metric="haversine", + cutoff_km=100.0, + ) + assert np.isinf(d[0]) + + def test_sparse_euclidean(self): + all_coords = np.array([[0.0, 0.0], [5.0, 0.0], [100.0, 0.0]]) + treated_coords = np.array([[0.0, 0.0]]) + d = _compute_nearest_treated_distance_sparse( + all_coords=all_coords, + treated_coords=treated_coords, + metric="euclidean", + cutoff_km=10.0, + ) + assert abs(d[0]) < 1e-12 + assert abs(d[1] - 5.0) < 1e-12 + assert np.isinf(d[2]) + + +# ============================================================================= +# Staggered nearest-treated distance +# ============================================================================= + + +@pytest.fixture +def staggered_panel(): + """Panel with two cohorts (t_treat=1 and t_treat=2) plus never-treated.""" + rows = [] + # Cohort A: 2 units treated at t=1 (near origin) + cohort_a = {"A0": (0.0, 0.0), "A1": (0.1, 0.0)} + # Cohort B: 2 units treated at t=2 (10 deg east of origin, ~1100 km away) + cohort_b = {"B0": (0.0, 10.0), "B1": (0.0, 10.1)} + # Never-treated: 1 unit far away + never = {"N0": (50.0, 0.0)} # very far north + first_treat = { + **{u: 1 for u in cohort_a}, + **{u: 2 for u in cohort_b}, + **{u: np.inf for u in never}, + } + coords = {**cohort_a, **cohort_b, **never} + for t in range(4): # periods 0..3 + for u, (lat, lon) in coords.items(): + rows.append({"unit": u, "time": t, "lat": lat, "lon": lon}) + df = pd.DataFrame(rows) + return df, first_treat + + +class TestComputeNearestTreatedDistanceStaggered: + """Staggered (time-varying) nearest-treated distance helper.""" + + def test_inf_pre_any_treatment(self, staggered_panel): + df, ft = staggered_panel + d_it, row_unit, row_time = _compute_nearest_treated_distance_staggered( + df, + unit="unit", + time="time", + coords=("lat", "lon"), + metric="haversine", + first_treat_by_unit=ft, + ) + # Period 0 has no treated units yet -> d_it = inf for all rows. + mask_t0 = row_time == 0 + assert np.isinf(d_it[mask_t0]).all() + + def test_cohort_a_active_at_t1(self, staggered_panel): + df, ft = staggered_panel + d_it, row_unit, row_time = _compute_nearest_treated_distance_staggered( + df, + unit="unit", + time="time", + coords=("lat", "lon"), + metric="haversine", + first_treat_by_unit=ft, + ) + mask_t1 = row_time == 1 + # Cohort A treats at t=1; B units should be ~1100km from A; A units near zero. + for u in ("A0", "A1"): + row = mask_t1 & (row_unit == u) + assert d_it[row][0] < 15.0 # within their own cohort + for u in ("B0", "B1"): + row = mask_t1 & (row_unit == u) + # B is ~1100km east of A; min distance to {A0, A1}. + # B0 at lon=10 -> 1112 km; B1 at lon=10.1 -> 1123 km. + assert 1100.0 < d_it[row][0] < 1130.0 + + def test_running_min_across_cohorts_at_t2(self, staggered_panel): + df, ft = staggered_panel + d_it, row_unit, row_time = _compute_nearest_treated_distance_staggered( + df, + unit="unit", + time="time", + coords=("lat", "lon"), + metric="haversine", + first_treat_by_unit=ft, + ) + mask_t2 = row_time == 2 + # At t=2, B0 and B1 are also treated; the nearest-treated set for B units is {A,B} + # but B is closer to itself -> nearly zero distance now. + for u in ("B0", "B1"): + row = mask_t2 & (row_unit == u) + assert d_it[row][0] < 15.0 + + +# ============================================================================= +# Ring indicator construction +# ============================================================================= + + +class TestBuildRingIndicators: + """Tests for _build_ring_indicators.""" + + def test_three_rings_three_distances(self): + rings = [0.0, 50.0, 100.0, 200.0] # K=3 rings + d_values = np.array([25.0, 75.0, 150.0]) + masks = _build_ring_indicators(d_values, rings) + assert masks.shape == (3, 3) + # Row 0: distance 25 -> Ring 1 (0 to 50) + np.testing.assert_array_equal(masks[0], [True, False, False]) + # Row 1: distance 75 -> Ring 2 (50 to 100) + np.testing.assert_array_equal(masks[1], [False, True, False]) + # Row 2: distance 150 -> Ring 3 (100 to 200) + np.testing.assert_array_equal(masks[2], [False, False, True]) + + def test_interior_boundary_belongs_to_upper_ring(self): + """Unit exactly at the boundary between two interior rings.""" + rings = [0.0, 50.0, 100.0, 200.0] + d_values = np.array([50.0]) # exactly on the 50.0 boundary + masks = _build_ring_indicators(d_values, rings) + # 50.0 should fall in Ring 2 (the upper of the boundary pair) per the + # half-open-at-top convention. + np.testing.assert_array_equal(masks[0], [False, True, False]) + + def test_outermost_boundary_belongs_to_last_ring(self): + """Unit exactly at d_bar should fall in the outermost ring, not be far.""" + rings = [0.0, 50.0, 100.0, 200.0] + d_values = np.array([200.0]) # exactly at d_bar + masks = _build_ring_indicators(d_values, rings) + # 200.0 should fall in the OUTERMOST ring (closed-at-top convention). + np.testing.assert_array_equal(masks[0], [False, False, True]) + + def test_distance_at_origin_lands_in_first_ring(self): + """Treated units have d_i = 0; they fall in Ring_1.""" + rings = [0.0, 50.0, 100.0, 200.0] + d_values = np.array([0.0]) + masks = _build_ring_indicators(d_values, rings) + np.testing.assert_array_equal(masks[0], [True, False, False]) + + def test_far_away_unit_in_no_ring(self): + """Distance beyond d_bar puts unit in NO ring.""" + rings = [0.0, 50.0, 100.0, 200.0] + d_values = np.array([300.0]) + masks = _build_ring_indicators(d_values, rings) + np.testing.assert_array_equal(masks[0], [False, False, False]) + + def test_single_ring(self): + """K=1 (single ring) case (Butts Equation 5 single-S_i form).""" + rings = [0.0, 100.0] + d_values = np.array([0.0, 50.0, 99.9, 100.0, 100.1]) + masks = _build_ring_indicators(d_values, rings) + assert masks.shape == (5, 1) + # First four are <= 100, last is > 100 (far-away). + np.testing.assert_array_equal(masks[:, 0], [True, True, True, True, False]) + + def test_too_few_breakpoints_raises(self): + with pytest.raises(ValueError, match="at least 2"): + _build_ring_indicators(np.array([0.0]), [50.0]) + + def test_non_increasing_raises(self): + with pytest.raises(ValueError, match="strictly increasing"): + _build_ring_indicators(np.array([0.0]), [50.0, 50.0, 100.0]) + + def test_negative_raises(self): + with pytest.raises(ValueError, match="non-negative"): + _build_ring_indicators(np.array([0.0]), [-10.0, 100.0]) + + +class TestRingLabel: + """Tests for _ring_label.""" + + def test_interior_ring_half_open(self): + rings = [0.0, 50.0, 100.0, 200.0] + assert _ring_label(rings, 0) == "[0, 50)" + assert _ring_label(rings, 1) == "[50, 100)" + + def test_outermost_ring_closed(self): + rings = [0.0, 50.0, 100.0, 200.0] + assert _ring_label(rings, 2) == "[100, 200]" + + def test_single_ring_closed_form(self): + rings = [0.0, 100.0] + assert _ring_label(rings, 0) == "[0, 100]" + + +# ============================================================================= +# Public class skeleton +# ============================================================================= + + +class TestSpilloverDiDInitGetParamsSetParams: + """Constructor + sklearn-like get_params / set_params surface.""" + + def test_construction_with_defaults(self): + est = SpilloverDiD(rings=[0.0, 50.0, 100.0]) + assert est.rings == [0.0, 50.0, 100.0] + assert est.d_bar is None + assert est.vcov_type == "hc1" + assert est.is_fitted_ is False + + def test_construction_with_all_kwargs(self): + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + vcov_type="conley", + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + cluster="region", + alpha=0.10, + anticipation=1, + event_study=True, + horizon_max=5, + rank_deficient_action="error", + ) + assert est.d_bar == 200.0 + assert est.vcov_type == "conley" + assert est.cluster == "region" + assert est.alpha == 0.10 + assert est.event_study is True + + def test_get_params_returns_constructor_args(self): + est = SpilloverDiD(rings=[0.0, 100.0]) + params = est.get_params() + # Check all constructor args present + expected = { + "rings", + "d_bar", + "vcov_type", + "conley_coords", + "conley_metric", + "conley_cutoff_km", + "conley_lag_cutoff", + "cluster", + "alpha", + "anticipation", + "event_study", + "horizon_max", + "rank_deficient_action", + } + assert set(params.keys()) == expected + + def test_set_params_updates_attribute(self): + est = SpilloverDiD(rings=[0.0, 50.0]) + est.set_params(d_bar=100.0, alpha=0.10) + assert est.d_bar == 100.0 + assert est.alpha == 0.10 + + def test_set_params_returns_self(self): + est = SpilloverDiD(rings=[0.0, 50.0]) + out = est.set_params(d_bar=100.0) + assert out is est + + def test_set_params_rejects_unknown_key(self): + est = SpilloverDiD(rings=[0.0, 50.0]) + with pytest.raises(ValueError, match="Unknown parameter"): + est.set_params(nonexistent_kwarg=42) + + def test_fit_survey_design_not_implemented(self): + """survey_design= is a deferred enhancement (Wave B MVP).""" + est = SpilloverDiD(rings=[0.0, 50.0], conley_coords=("lat", "lon")) + df = pd.DataFrame( + { + "unit": ["A", "A"], + "time": [0, 1], + "y": [1.0, 2.0], + "D": [0, 1], + "lat": [0.0, 0.0], + "lon": [0.0, 0.0], + } + ) + with pytest.raises(NotImplementedError, match="survey_design"): + est.fit( + df, + outcome="y", + unit="unit", + time="time", + treatment="D", + survey_design=object(), + ) + + +# ============================================================================= +# Step 3: Two-stage Gardner fit() integration +# ============================================================================= + + +def _make_butts_2period_dgp( + *, + n_treated: int = 10, + n_near_control: int = 30, + n_far_control: int = 30, + tau_total: float = -0.07, + delta_1: float = -0.04, + d_bar: float = 100.0, + seed: int = 42, +) -> pd.DataFrame: + """Build a 2-period panel with known direct + spillover effects. + + Layout: + - Treated units cluster near (lat=0, lon=0). + - Near-controls distributed within d_bar km. + - Far-controls placed ~2*d_bar km away (clean control group). + Outcomes (potential outcomes model): + - Y_it(0, 0) = mu_i + lambda_t + e_it (clean trend, common across all units) + - Treated unit at t=1: Y = mu_i + lambda_1 + tau_total + e_it + - Near-control at t=1: Y = mu_i + lambda_1 + delta_1 + e_it + - Far-control at t=1: Y = mu_i + lambda_1 + e_it + All units satisfy parallel trends (Butts Assumption 6/7). + """ + rng = np.random.default_rng(seed) + n_units = n_treated + n_near_control + n_far_control + units = [f"u{i:03d}" for i in range(n_units)] + mu = rng.normal(0.0, 0.5, size=n_units) + + coords = [] + is_treated = [] + is_near = [] + for i in range(n_treated): + # Cluster within ~5 km of origin + coords.append((rng.normal(0, 0.05), rng.normal(0, 0.05))) + is_treated.append(True) + is_near.append(False) + for i in range(n_near_control): + # Within d_bar (uniform in a band 10–80 km east) + lat = rng.uniform(0.1, 0.7) # ~11–78 km north + lon = rng.uniform(-0.3, 0.3) # spread east-west + coords.append((lat, lon)) + is_treated.append(False) + is_near.append(True) + for i in range(n_far_control): + # Far-aways at 2*d_bar+ km + lat = rng.uniform(2.0, 3.0) # ~220–330 km north + lon = rng.uniform(-0.5, 0.5) + coords.append((lat, lon)) + is_treated.append(False) + is_near.append(False) + + rows = [] + lambda_t = [0.0, 0.1] # common time trend + for i, u in enumerate(units): + for t in (0, 1): + y_clean = mu[i] + lambda_t[t] + if t == 1 and is_treated[i]: + y = y_clean + tau_total + elif t == 1 and is_near[i]: + y = y_clean + delta_1 + else: + y = y_clean + y += rng.normal(0, 0.02) # noise + rows.append( + { + "unit": u, + "time": t, + "lat": coords[i][0], + "lon": coords[i][1], + "D": int(is_treated[i] and t == 1), + "y": y, + } + ) + return pd.DataFrame(rows) + + +class TestSpilloverDiDFitBasic: + """Step 3 integration: fit() produces sensible point estimates.""" + + def test_fit_runs_without_error(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + ) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + assert result is not None + assert est.is_fitted_ + + def test_recovers_tau_total_within_tolerance(self): + df = _make_butts_2period_dgp(seed=42, tau_total=-0.07, delta_1=-0.04) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # Single-seed tolerance — looser than the 200-seed MC test in Step 5. + assert abs(result.att - (-0.07)) < 0.04 + + def test_recovers_ring_coefficient(self): + df = _make_butts_2period_dgp(seed=42, tau_total=-0.07, delta_1=-0.04) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + assert result.spillover_effects is not None + ring_coef = result.spillover_effects.iloc[0]["coef"] + assert abs(ring_coef - (-0.04)) < 0.04 + + def test_result_has_expected_fields(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + assert result.ring_breakpoints == [0.0, 100.0] + assert result.d_bar == 100.0 + assert result.is_staggered is False + assert result.n_far_away_obs > 0 + assert result.stage1_n_obs > 0 + assert "[0, 100]" in result.n_units_ever_in_ring + + def test_summary_includes_ring_block(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 50.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + summary = result.summary() + assert "Spillover Effects" in summary + # Two ring labels: [0, 50) and [50, 100] + assert "[0, 50)" in summary or "[50, 100]" in summary + + def test_to_dict_serializes_spillover_effects(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + d = result.to_dict() + assert "spillover_effects" in d + assert d["spillover_effects"] is not None + assert "ring_breakpoints" in d + assert d["d_bar"] == 100.0 + + +class TestSpilloverDiDRawDataInvariant: + """Step 3: caller's DataFrame must not be mutated by fit().""" + + def test_caller_data_unchanged(self): + df = _make_butts_2period_dgp(seed=42) + original_cols = list(df.columns) + original_len = len(df) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # Caller's DataFrame should not gain or lose columns/rows from fit(). + assert list(df.columns) == original_cols + assert len(df) == original_len + + +# ============================================================================= +# Step 5: Identification MC tests via _dgp_utils.py +# ============================================================================= + + +class TestSpilloverDiDIdentification: + """50-seed Monte Carlo: SpilloverDiD recovers known DGP within MC tolerance. + + Plan's Step 5 target was 200 seeds; this is a faster default that still + rejects gross misidentification. A 200-seed version marked `@pytest.mark.slow` + runs in CI's full suite (`pytest -m slow`). + """ + + def test_nonstaggered_recovers_tau_total(self): + att_estimates = [] + n_seeds = 50 + for s in range(n_seeds): + df = generate_butts_nonstaggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s) + result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit( + df, outcome="y", unit="unit", time="time", treatment="D" + ) + att_estimates.append(result.att) + mean_att = float(np.mean(att_estimates)) + # MC tolerance: mean over 50 seeds at error_sd=0.05, ~200 units → + # SE-of-mean ~ 0.05/sqrt(50 * 200) ~ 5e-4. Tolerance 0.02 leaves + # margin for DGP design noise. + assert ( + abs(mean_att - (-0.07)) < 0.02 + ), f"non-staggered tau_total: expected -0.07, got {mean_att:.4f}" + + def test_nonstaggered_recovers_delta_1(self): + delta_estimates = [] + n_seeds = 50 + for s in range(n_seeds): + df = generate_butts_nonstaggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s) + result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit( + df, outcome="y", unit="unit", time="time", treatment="D" + ) + if result.spillover_effects is not None: + delta_estimates.append(result.spillover_effects.iloc[0]["coef"]) + mean_delta = float(np.mean(delta_estimates)) + assert ( + abs(mean_delta - (-0.04)) < 0.02 + ), f"non-staggered delta_1: expected -0.04, got {mean_delta:.4f}" + + @pytest.mark.slow + def test_nonstaggered_recovers_at_200_seeds(self): + """Plan-targeted 200-seed MC. Marked slow; run via `pytest -m slow`.""" + att_estimates = [] + delta_estimates = [] + for s in range(200): + df = generate_butts_nonstaggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s) + result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit( + df, outcome="y", unit="unit", time="time", treatment="D" + ) + att_estimates.append(result.att) + if result.spillover_effects is not None: + delta_estimates.append(result.spillover_effects.iloc[0]["coef"]) + assert abs(np.mean(att_estimates) - (-0.07)) < 0.02 + assert abs(np.mean(delta_estimates) - (-0.04)) < 0.02 + + def test_staggered_recovers_tau_total(self): + """Staggered MC with 30 seeds (smaller because each DGP is larger).""" + att_estimates = [] + for s in range(30): + df = generate_butts_staggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s) + result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit( + df, outcome="y", unit="unit", time="time", first_treat="first_treat" + ) + att_estimates.append(result.att) + mean_att = float(np.mean(att_estimates)) + # Staggered MC is noisier; allow a looser tolerance. + assert ( + abs(mean_att - (-0.07)) < 0.04 + ), f"staggered tau_total: expected -0.07, got {mean_att:.4f}" + + +# ============================================================================= +# Step 3 (continued): staggered smoke test +# ============================================================================= + + +# ============================================================================= +# Step 7: Conley integration end-to-end +# ============================================================================= + + +class TestSpilloverDiDWithConley: + """Step 7: vcov_type='conley' flows through stage 2 cleanly.""" + + def test_conley_fit_runs(self): + df = generate_butts_nonstaggered_dgp(seed=42) + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + vcov_type="conley", + conley_cutoff_km=200.0, + conley_lag_cutoff=0, # cross-sectional only (2-period panel) + ) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + assert result.vcov_type == "conley" + assert result.conley_lag_cutoff == 0 + assert np.isfinite(result.se) + + def test_conley_se_differs_from_hc1(self): + """Conley SE differs from HC1 baseline (spatial correlation in errors).""" + df = generate_butts_nonstaggered_dgp( + seed=42, n_treated=20, n_near_control=80, n_far_control=100 + ) + est_hc1 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + vcov_type="hc1", + ) + result_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D") + est_conley = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + vcov_type="conley", + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + ) + result_conley = est_conley.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # ATT point estimate unchanged across vcov; SE may differ. + assert abs(result_hc1.att - result_conley.att) < 1e-10 + # Conley SE may be larger or smaller than HC1 depending on spatial + # error correlation; just assert it's not identical. + # (Synthetic DGP has independent errors so they may be very close; + # use a loose tolerance — primarily a wiring test.) + assert np.isfinite(result_conley.se) + + +# ============================================================================= +# Step 3 (continued): staggered smoke test +# ============================================================================= + + +class TestSpilloverDiDStaggeredFit: + """Step 3: staggered timing produces sensible results.""" + + def test_staggered_fit_runs(self): + # 3 cohorts, 4 periods + rng = np.random.default_rng(0) + rows = [] + cohort_onsets = {1: 1, 2: 2, "C": np.inf} + # 6 units per cohort placed near distinct centers + for cohort_id, onset in cohort_onsets.items(): + center_lat = 0.0 if cohort_id == 1 else (10.0 if cohort_id == 2 else 50.0) + for i in range(6): + u = f"{cohort_id}_{i}" + lat = center_lat + rng.normal(0, 0.05) + lon = rng.normal(0, 0.05) + first_treat = float(onset) if onset != np.inf else np.inf + for t in range(4): + rows.append( + { + "unit": u, + "time": t, + "lat": lat, + "lon": lon, + "first_treat": first_treat, + "y": 1.0 + + 0.1 * t + + (0.05 * (t >= first_treat) if np.isfinite(first_treat) else 0) + + rng.normal(0, 0.05), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD( + rings=[0.0, 50.0], # ring covers 0-50 km; far cutoff at 50 + conley_coords=("lat", "lon"), + ) + result = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + assert result.is_staggered is True + assert np.isfinite(result.att) + + +# ============================================================================= +# Step 2: Timing conversion helpers +# ============================================================================= + + +class TestExtractTreatmentOnsets: + """Tests for _extract_treatment_onsets.""" + + def test_canonical_finite_onsets(self): + df = pd.DataFrame( + { + "unit": ["A", "A", "B", "B", "C", "C"], + "first_treat": [1, 1, 2, 2, np.inf, np.inf], + "time": [0, 1, 0, 1, 0, 1], + } + ) + onsets = _extract_treatment_onsets(df, "first_treat", "unit") + assert onsets == {"A": 1.0, "B": 2.0, "C": np.inf} + + def test_zero_treated_as_never(self): + df = pd.DataFrame({"unit": ["A", "A"], "first_treat": [0, 0], "time": [0, 1]}) + onsets = _extract_treatment_onsets(df, "first_treat", "unit") + assert onsets == {"A": np.inf} + + def test_nan_treated_as_never(self): + df = pd.DataFrame({"unit": ["A", "A"], "first_treat": [np.nan, np.nan], "time": [0, 1]}) + onsets = _extract_treatment_onsets(df, "first_treat", "unit") + assert onsets == {"A": np.inf} + + +class TestConvertTreatmentToFirstTreat: + """Tests for _convert_treatment_to_first_treat.""" + + def test_basic_conversion(self): + # 2 units, 3 periods; A treated from t=1, B never-treated. + df = pd.DataFrame( + { + "unit": ["A"] * 3 + ["B"] * 3, + "time": [0, 1, 2, 0, 1, 2], + "D": [0, 1, 1, 0, 0, 0], + } + ) + out, col = _convert_treatment_to_first_treat(df, "D", "time", "unit") + assert col == "_spillover_first_treat" + a_rows = out[out["unit"] == "A"] + b_rows = out[out["unit"] == "B"] + assert (a_rows["_spillover_first_treat"] == 1.0).all() + assert np.isinf(b_rows["_spillover_first_treat"]).all() + + def test_no_treated_units_marks_all_inf(self): + df = pd.DataFrame({"unit": ["A", "A"], "time": [0, 1], "D": [0, 0]}) + out, _ = _convert_treatment_to_first_treat(df, "D", "time", "unit") + assert np.isinf(out["_spillover_first_treat"]).all() + + def test_missing_treatment_column_raises(self): + df = pd.DataFrame({"unit": ["A"], "time": [0]}) + with pytest.raises(ValueError, match="not in data"): + _convert_treatment_to_first_treat(df, "D", "time", "unit") + + def test_non_binary_treatment_raises(self): + df = pd.DataFrame({"unit": ["A", "A"], "time": [0, 1], "D": [0, 2]}) + with pytest.raises(ValueError, match="exact 0/1"): + _convert_treatment_to_first_treat(df, "D", "time", "unit") + + def test_caller_dataframe_unchanged(self): + df = pd.DataFrame({"unit": ["A", "A"], "time": [0, 1], "D": [0, 1]}) + original_cols = list(df.columns) + _convert_treatment_to_first_treat(df, "D", "time", "unit") + # The defensive copy + column add does NOT leak back to caller. + assert list(df.columns) == original_cols + + +# ============================================================================= +# Step 2: SpilloverDiD validators +# ============================================================================= + + +@pytest.fixture +def simple_panel(): + """Minimal valid 2-period panel for validator tests.""" + return pd.DataFrame( + { + "unit": ["A", "A", "B", "B", "C", "C", "D", "D"], + "time": [0, 1] * 4, + "lat": [0.0, 0.0, 0.1, 0.1, 5.0, 5.0, 5.1, 5.1], + "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], + "D": [0, 1, 0, 1, 0, 0, 0, 0], + "first_treat": [1, 1, 1, 1, np.inf, np.inf, np.inf, np.inf], + "y": np.arange(8.0), + } + ) + + +class TestValidateSpilloverInputs: + """Behavioral validation of front-door checks.""" + + def test_valid_minimal_input_passes(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 200.0], conley_coords=("lat", "lon")) + # Should not raise. + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + assert est._effective_d_bar == 200.0 + + def test_rings_too_short_raises(self, simple_panel): + est = SpilloverDiD(rings=[100.0]) + with pytest.raises(ValueError, match="at least 2 breakpoints"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_rings_non_sorted_raises(self, simple_panel): + est = SpilloverDiD(rings=[50.0, 100.0, 75.0]) + with pytest.raises(ValueError, match="strictly increasing"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_rings_negative_raises(self, simple_panel): + est = SpilloverDiD(rings=[-10.0, 100.0]) + with pytest.raises(ValueError, match="non-negative"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_d_bar_mismatched_with_rings_raises(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 100.0, 200.0], d_bar=150.0) + with pytest.raises(ValueError, match="d_bar.*must equal"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_d_bar_equal_to_max_rings_accepted(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 100.0, 200.0], d_bar=200.0, conley_coords=("lat", "lon")) + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + assert est._effective_d_bar == 200.0 + + def test_d_bar_default_uses_max_rings(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 50.0, 175.0], conley_coords=("lat", "lon")) + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + assert est._effective_d_bar == 175.0 + + def test_treatment_and_first_treat_both_raise(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 200.0]) + with pytest.raises(ValueError, match="either.*or"): + est._validate_spillover_inputs(simple_panel, "D", "first_treat", "time", "unit", "y") + + def test_neither_treatment_nor_first_treat_raises(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 200.0]) + with pytest.raises(ValueError, match="Exactly one of"): + est._validate_spillover_inputs(simple_panel, None, None, "time", "unit", "y") + + def test_missing_required_column_raises(self, simple_panel): + est = SpilloverDiD(rings=[0.0, 200.0]) + with pytest.raises(ValueError, match="Missing required columns"): + est._validate_spillover_inputs( + simple_panel, "D", None, "time", "nonexistent_unit_col", "y" + ) + + def test_conley_requires_coords(self, simple_panel): + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + ) + with pytest.raises(ValueError, match="conley_coords"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_conley_coords_must_be_2_tuple(self, simple_panel): + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_coords=("lat",), # type: ignore[arg-type] # only 1 element + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + ) + with pytest.raises(ValueError, match="2-tuple"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_conley_coord_column_missing_raises(self, simple_panel): + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_coords=("lat", "missing_lon"), + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + ) + with pytest.raises(ValueError, match="'missing_lon' not in data"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_conley_requires_positive_cutoff(self, simple_panel): + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_coords=("lat", "lon"), + conley_cutoff_km=-1.0, + conley_lag_cutoff=0, + ) + with pytest.raises(ValueError, match="conley_cutoff_km"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_conley_requires_lag_cutoff(self, simple_panel): + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_coords=("lat", "lon"), + conley_cutoff_km=200.0, + conley_lag_cutoff=None, + ) + with pytest.raises(ValueError, match="conley_lag_cutoff"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + def test_nan_coords_raise(self, simple_panel): + df = simple_panel.copy() + df.loc[0, "lat"] = np.nan + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_coords=("lat", "lon"), + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + ) + with pytest.raises(ValueError, match="non-finite"): + est._validate_spillover_inputs(df, "D", None, "time", "unit", "y") + + def test_no_treated_observations_raises(self, simple_panel): + df = simple_panel.copy() + df["D"] = 0 + est = SpilloverDiD(rings=[0.0, 200.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="No treated observations"): + est._validate_spillover_inputs(df, "D", None, "time", "unit", "y") + + def test_cluster_column_missing_raises(self, simple_panel): + est = SpilloverDiD( + rings=[0.0, 200.0], + vcov_type="conley", + conley_coords=("lat", "lon"), + conley_cutoff_km=200.0, + conley_lag_cutoff=0, + cluster="not_a_real_column", + ) + with pytest.raises(ValueError, match="cluster column"): + est._validate_spillover_inputs(simple_panel, "D", None, "time", "unit", "y") + + +class TestValidateFarAwayExists: + """Tests for SpilloverDiD._validate_far_away_exists.""" + + def test_returns_count_when_satisfied(self): + est = SpilloverDiD(rings=[0.0, 100.0]) + est._effective_d_bar = 100.0 + d = np.array([10.0, 50.0, 500.0, 1000.0]) + is_control = np.array([False, True, True, True]) + n = est._validate_far_away_exists(d, is_control) + assert n == 2 + + def test_raises_when_no_far_controls(self): + est = SpilloverDiD(rings=[0.0, 100.0]) + est._effective_d_bar = 100.0 + d = np.array([10.0, 50.0, 99.9, 100.0]) + is_control = np.array([False, True, True, True]) + with pytest.raises(ValueError, match="Assumption 5"): + est._validate_far_away_exists(d, is_control) + + def test_raises_when_far_units_all_treated(self): + """Only treated units beyond d_bar (impossible in non-staggered, but the + validator's job is to check the population that identifies the + counterfactual: controls strictly past d_bar).""" + est = SpilloverDiD(rings=[0.0, 100.0]) + est._effective_d_bar = 100.0 + d = np.array([200.0, 300.0, 50.0]) + is_control = np.array([False, False, True]) # only the close unit is control + with pytest.raises(ValueError, match="Assumption 5"): + est._validate_far_away_exists(d, is_control) + + +# ============================================================================= +# Codex-review regression tests (post-Wave-B-MVP first review) +# ============================================================================= + + +class TestSpilloverDiDCovariatesRejected: + """covariates= must raise NotImplementedError in Wave B MVP. + + Stage-1 covariate residualization (Gardner-style) is not yet wired + through; appending raw covariates only at stage 2 silently biases + tau_total / delta_j on panels with time-varying covariates. + """ + + def test_covariates_raises_not_implemented(self): + df = _make_butts_2period_dgp(seed=42) + df["x"] = np.random.default_rng(0).normal(size=len(df)) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(NotImplementedError, match="covariates"): + est.fit( + df, + outcome="y", + unit="unit", + time="time", + treatment="D", + covariates=["x"], + ) + + def test_empty_covariates_accepted(self): + """An empty covariates list is the same as no covariates — should NOT raise.""" + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + # Should not raise. + est.fit(df, outcome="y", unit="unit", time="time", treatment="D", covariates=[]) + + +class TestSpilloverDiDAbsorbingTreatmentValidation: + """Reversible / non-absorbing treatment patterns must raise.""" + + def test_reversible_treatment_path_raises(self): + # A unit's treatment goes [0, 1, 0] across 3 periods. + rows = [] + rng = np.random.default_rng(0) + for u in ("treated_reversing", "ctrl_far"): + for t in range(3): + if u == "treated_reversing": + d_val = 1 if t == 1 else 0 + lat, lon = 0.0, 0.0 + else: + d_val = 0 + lat, lon = 5.0, 0.0 + rows.append( + { + "unit": u, + "time": t, + "lat": lat, + "lon": lon, + "D": d_val, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="non-absorbing"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_non_constant_first_treat_raises(self): + # Same unit has two different first_treat values across rows. + rows = [] + for t in range(3): + rows.append( + { + "unit": "u1", + "time": t, + "lat": 0.0, + "lon": 0.0, + "first_treat": 1.0 if t < 2 else 2.0, # CHANGES at t=2 + "y": float(t), + } + ) + rows.append( + { + "unit": "u2_far", + "time": t, + "lat": 5.0, + "lon": 0.0, + "first_treat": np.inf, + "y": float(t), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="non-constant"): + est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + + +class TestSpilloverDiDAllEventuallyTreated: + """All-eventually-treated staggered designs (no never-treated units) + should work as long as some not-yet-treated rows are far-away controls. + The Codex review flagged the prior is_ever_control logic as too strict. + """ + + def test_all_eventually_treated_staggered(self): + # Two cohorts, no never-treated units. Far cohort treats at t=10 + # (well past the panel's t=3 horizon), so its early-period rows + # serve as far-away controls for the early cohort's treatment. + rng = np.random.default_rng(0) + # Per-unit coords sampled ONCE (within-unit constancy required). + early_coords = [(rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(8)] + late_coords = [(5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(8)] + rows = [] + # Early cohort: treats at t=2, clustered at origin + for i, (lat, lon) in enumerate(early_coords): + u = f"early_{i}" + for t in range(4): + rows.append( + { + "unit": u, + "time": t, + "lat": lat, + "lon": lon, + "first_treat": 2.0, + "y": rng.normal() + 0.1 * t + (-0.07 if t >= 2 else 0.0), + } + ) + # Late cohort: treats at t=10 (far outside panel), placed FAR from early + for i, (lat, lon) in enumerate(late_coords): + u = f"late_{i}" + for t in range(4): + rows.append( + { + "unit": u, + "time": t, + "lat": lat, + "lon": lon, + "first_treat": 10.0, # never treats in panel + "y": rng.normal() + 0.1 * t, + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + assert np.isfinite(result.att) + + +class TestSpilloverDiDConleyCoordsAlwaysRequired: + """conley_coords must be validated on every fit path, not just vcov_type=conley. + The Codex review noted the default-hc1 path was failing with AssertionError. + """ + + def test_missing_conley_coords_on_hc1_path_raises_value_error(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 100.0]) # no conley_coords, default hc1 + with pytest.raises(ValueError, match="conley_coords"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDResultNObsMatchesEstimationSample: + """result.n_obs must equal the stage-2 estimation sample (after dropping + rows with non-finite y_tilde from rank-deficient stage-1 FE). + """ + + def test_n_obs_equals_finite_mask_count(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # In a well-conditioned DGP no rows are dropped, so n_obs = len(df). + assert result.n_obs == len(df) + # n_treated + n_control == n_obs (no overlap, no leakage). + assert result.n_treated + result.n_control == result.n_obs + + +# ============================================================================= +# Codex review round-2 regression tests +# ============================================================================= + + +class TestSpilloverDiDOmega0IdentificationCheck: + """Stage-1 FE support: every unit and every period in the panel must have + at least one Omega_0 = {D_it=0 AND S_it=0} row, otherwise FE is + unidentified for that unit/period and stage-2 estimates would be + silently dropped. Round-2 codex review wanted up-front rejection. + """ + + def test_unsupported_period_raises(self): + # Panel where t=1 has zero Omega_0 support: every t=1 unit is + # either treated (D=1) or near (S=1 since d_i <= d_bar=200). + # The far-aways only contribute at t=0 (no t=1 row). + rng = np.random.default_rng(0) + # Per-unit coords sampled ONCE (within-unit-constancy required). + treated_coords = [(rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(4)] + near_coords = [(rng.uniform(0.1, 0.5), rng.uniform(-0.3, 0.3)) for _ in range(4)] + far_coords = [(5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005)) for _ in range(4)] + rows = [] + for i, (lat, lon) in enumerate(treated_coords): + for t in range(2): + rows.append( + { + "unit": f"T{i}", + "time": t, + "lat": lat, + "lon": lon, + "D": int(t == 1), + "y": rng.normal(), + } + ) + for i, (lat, lon) in enumerate(near_coords): + for t in range(2): + rows.append( + { + "unit": f"N{i}", + "time": t, + "lat": lat, + "lon": lon, + "D": 0, + "y": rng.normal(), + } + ) + # Far-aways at PRE only: no t=1 row → Omega_0 ∩ {t=1} is empty. + for i, (lat, lon) in enumerate(far_coords): + rows.append( + { + "unit": f"F{i}", + "time": 0, + "lat": lat, + "lon": lon, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 200.0], d_bar=200.0, conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="unidentified"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDConleyCoordsConstantPerUnit: + """conley_coords must be constant within each unit across rows. + Round-2 codex review noted ring construction collapses coords via + `drop_duplicates(subset=[unit])` — non-constant coords silently use + only the first row's location. + """ + + def test_time_varying_coords_raises(self): + rows = [] + rng = np.random.default_rng(0) + # u1 has different lat at t=0 vs t=1 + rows.append({"unit": "u1", "time": 0, "lat": 0.0, "lon": 0.0, "D": 0, "y": rng.normal()}) + rows.append( + { + "unit": "u1", + "time": 1, + "lat": 1.5, # changed! + "lon": 0.0, + "D": 1, + "y": rng.normal(), + } + ) + # u2 is well-behaved (constant coords) + for t in (0, 1): + rows.append( + { + "unit": "u2", + "time": t, + "lat": 5.0, + "lon": 0.0, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="non-constant"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +# ============================================================================= +# Codex review round-3 regression tests +# ============================================================================= + + +class TestSpilloverDiDFractionalTreatmentRejected: + """Treatment column with fractional values (0.9, 1.1, etc.) must raise. + Round-3 codex review caught that `int(v) in (0, 1)` was rounding down + and silently misclassifying fractional rows. + """ + + @pytest.mark.parametrize("bad_value", [0.5, 0.9, 1.1, -0.1, 2.5]) + def test_fractional_treatment_raises(self, bad_value): + df = _make_butts_2period_dgp(seed=42).copy() + # Cast D to float64 before assigning the fractional value. Modern + # pandas (3.x) raises TypeError on int64-column fractional setitem + # BEFORE SpilloverDiD.fit() ever sees the input, so we promote the + # column dtype first to ensure the fractional value actually + # reaches the validator we're testing. + df["D"] = df["D"].astype(float) + first_treated_idx = df.index[df["D"] == 1][0] + df.loc[first_treated_idx, "D"] = bad_value + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="exact 0/1"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDNoTreatedRowRaises: + """If all first_treat > max(time), D_it is all zeros after the + anticipation shift. Must raise a clear identification error rather + than crashing in solve_ols. + """ + + def test_future_only_first_treat_raises(self): + rng = np.random.default_rng(0) + # All units treat at t=10, but panel only spans t=0..2. + rows = [] + for i in range(4): + lat, lon = rng.normal(0, 0.005), rng.normal(0, 0.005) + for t in range(3): + rows.append( + { + "unit": f"T{i}", + "time": t, + "lat": lat, + "lon": lon, + "first_treat": 10.0, # > max(time) = 2 + "y": rng.normal(), + } + ) + for i in range(4): + lat, lon = 5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005) + for t in range(3): + rows.append( + { + "unit": f"F{i}", + "time": t, + "lat": lat, + "lon": lon, + "first_treat": np.inf, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="No observation is treated"): + est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + + +class TestSpilloverDiDHaversineDomain: + """Haversine lat/lon range validation applies on EVERY vcov path + (not just vcov_type='conley'), because ring construction always + uses the configured metric. Round-3 codex review noted out-of-range + coords silently produced wrong ring assignment on hc1/cluster paths. + """ + + def test_out_of_range_lat_on_hc1_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + # Corrupt one row's lat to be > 90 (impossible geographic value). + df.loc[0, "lat"] = 95.0 + # Force the constancy check to ignore this corruption: also corrupt + # the unit's other row to the same value (constant per unit). + unit_of_first = df.loc[0, "unit"] + df.loc[df["unit"] == unit_of_first, "lat"] = 95.0 + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + vcov_type="hc1", # NOT conley + ) + with pytest.raises(ValueError, match=r"latitude.*\[-90, 90\]"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_out_of_range_lon_on_hc1_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + unit_of_first = df.loc[0, "unit"] + df.loc[df["unit"] == unit_of_first, "lon"] = 200.0 # > 180 + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match=r"longitude.*\[-180, 180\]"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_euclidean_metric_skips_range_check(self): + """conley_metric='euclidean' must NOT enforce haversine ranges.""" + df = _make_butts_2period_dgp(seed=42).copy() + # Coordinates 95.0 / 200.0 are valid Euclidean but invalid haversine. + df.loc[df["unit"] == df.loc[0, "unit"], "lat"] = 95.0 + df.loc[df["unit"] == df.loc[0, "unit"], "lon"] = 200.0 + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="euclidean", + ) + # Should not raise. (May still fail downstream for other reasons — + # we just need to confirm the haversine range gate is metric-aware.) + try: + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + except ValueError as exc: + # Acceptable: a different (non-domain) error from downstream. + assert "[-90, 90]" not in str(exc) and "[-180, 180]" not in str(exc) + + +# ============================================================================= +# Codex review round-4 regression tests +# ============================================================================= + + +class TestSpilloverDiDZeroIndexedBaselineTreated: + """Auto-generated `_spillover_first_treat` (from binary `D`) must NOT + collapse `0` onto the never-treated sentinel. Round-4 codex review + caught that `_extract_treatment_onsets` was collapsing 0 → inf for + EVERY first_treat input, silently reclassifying baseline-treated + units (D=1 at t=0) as never-treated. + """ + + def test_baseline_treated_unit_at_t0_recognized(self): + """A baseline-treated unit (D=1 at t=0) used to silently become + never-treated because `_convert_treatment_to_first_treat` wrote + first_treat=0 and `_extract_treatment_onsets` collapsed 0 -> inf. + After the fix, u1 is correctly recognized as treated. Because u1 + has no Omega_0 rows (D=1 at all t), it triggers the round-16 + warn-and-drop path: the warning naming `u1_baseline` PROVES it was + recognized as treated (the OLD bug silently reclassified u1 to + "never treated", which would have passed the Omega_0 check + without warning and produced garbage estimates). + """ + rng = np.random.default_rng(0) + rows = [] + # u1: baseline-treated (D=1 at all t). No Omega_0 rows → warned- + # and-dropped from stage 2. + # u2: treated from t=1 (D=0 at t=0, D=1 at t=1). Far from u1. + # u3: untreated far-control. Provides Omega_0 support. + for t in (0, 1): + rows.append( + { + "unit": "u1_baseline", + "time": t, + "lat": 0.0, + "lon": 0.0, + "D": 1, + "y": rng.normal(), + } + ) + for t in (0, 1): + rows.append( + { + "unit": "u2_treated_t1", + "time": t, + "lat": 10.0, + "lon": 0.0, + "D": int(t == 1), + "y": rng.normal(), + } + ) + for t in (0, 1): + rows.append( + { + "unit": "u3_far_control", + "time": t, + "lat": 20.0, + "lon": 0.0, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + # PROOF u1 is recognized as treated: the warning names u1_baseline. + with pytest.warns(UserWarning, match="u1_baseline"): + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # u1's 2 rows excluded from stage 2; u2 (1 treated row at t=1) and + # u3 (2 control rows) remain. Verify n_treated reflects only the + # supported sample. + assert result.n_treated == 1, ( + f"After warn-drop of u1_baseline, expected 1 treated row " + f"(u2 at t=1); got n_treated={result.n_treated}" + ) + + def test_partial_unsupported_units_warn_and_drop(self): + """Round-16 codex review: units with no Omega_0 row should be + warned-and-dropped (matching TwoStageDiD's always-treated convention), + not block the full fit. The remaining supported sample fits normally. + """ + rng = np.random.default_rng(1) + rows = [] + # 4 baseline-treated units (no Omega_0 rows → all 4 warned-dropped). + for k in range(4): + for t in (0, 1): + rows.append( + { + "unit": f"baseline_{k}", + "time": t, + "lat": 0.0 + k * 0.001, + "lon": 0.0, + "D": 1, + "y": rng.normal(), + } + ) + # 3 validly-treated units (treated from t=1; far from baselines). + for k in range(3): + for t in (0, 1): + rows.append( + { + "unit": f"treated_t1_{k}", + "time": t, + "lat": 10.0 + k * 0.01, + "lon": 0.0, + "D": int(t == 1), + "y": rng.normal(), + } + ) + # 5 far-controls (full Omega_0 support). + for k in range(5): + for t in (0, 1): + rows.append( + { + "unit": f"far_control_{k}", + "time": t, + "lat": 20.0 + k * 0.01, + "lon": 0.0, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.warns(UserWarning, match="4 unit\\(s\\) have NO"): + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # 4 baselines × 2 periods = 8 rows excluded. Remaining: 3 treated + + # 5 controls = 8 units × 2 periods = 16 rows. n_treated = 3 (one + # per treated unit at t=1). + assert result.n_treated == 3 + assert result.n_obs == 16 + + def test_unsupported_period_still_raises(self): + """Period-level Omega_0 unsupport remains a hard error (round-16 + codex split): dropping a period would remove all units' rows at + that t, losing the cross-time identification entirely. + """ + rng = np.random.default_rng(2) + # Balanced panel where t=1 has NO Omega_0 rows: every unit at t=1 + # is either treated or near a treated unit. + rows = [] + # 2 treated units at t=1; 2 near-controls (within d_bar of treated) + # at both t. No far-controls → no Omega_0 row at t=1. + for k in range(2): + for t in (0, 1): + rows.append( + { + "unit": f"T{k}", + "time": t, + "lat": 0.0 + k * 0.001, + "lon": 0.0, + "D": int(t == 1), + "y": rng.normal(), + } + ) + # Near-controls at both periods. Pre: untreated and (no current + # treatment) → S=0 → Omega_0. Post: untreated but treated nearby → + # S=1 → NOT Omega_0. So t=1 has no Omega_0 row. + for k in range(2): + for t in (0, 1): + rows.append( + { + "unit": f"N{k}", + "time": t, + "lat": 0.1 + k * 0.001, + "lon": 0.0, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], d_bar=100.0, conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="period.*unidentified|unidentified.*period"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_baseline_treated_with_far_control_pretreatment_works(self): + """A control unit at t<0 (pre-treatment for the baseline-treated) + provides the missing Omega_0 support, but baseline-treated units + still have no untreated rows. Confirm the system can FIT when + there's a clean control population, while still recognizing + baseline-treated units as treated.""" + rng = np.random.default_rng(0) + rows = [] + # u1 is treated from t=1 (NOT baseline) — gives Omega_0 support. + for t in (0, 1): + rows.append( + { + "unit": "u1_t1_treated", + "time": t, + "lat": 0.0, + "lon": 0.0, + "D": int(t == 1), + "y": rng.normal(), + } + ) + for t in (0, 1): + rows.append( + { + "unit": "u2_far", + "time": t, + "lat": 5.0, + "lon": 0.0, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # n_treated should be 1 (u1 at t=1 only); n_control should be 3 + # (u1's t=0 row + u2's both rows). Critical: u1 is NOT silently + # reclassified. + assert result.n_treated == 1 + assert result.n_control == 3 + + +class TestSpilloverDiDCallableSelfDistance: + """Callable metrics on the ring-construction path must satisfy the + same self-distance / symmetry contract as conley's vcov path. + Round-4 codex review noted positive self-distance silently corrupted + ring assignment on hc1/cluster fits. + """ + + def test_positive_self_distance_callable_raises(self): + df = _make_butts_2period_dgp(seed=42) + + def bad_metric(a, b): + # Returns CONSTANT 7.5 — fails the zero-diagonal check on the + # (n, n) self-call validation. + return np.full((a.shape[0], b.shape[0]), 7.5) + + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric=bad_metric, + ) + with pytest.raises(ValueError, match="diagonal"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +# ============================================================================= +# Codex review round-5 regression tests +# ============================================================================= + + +class TestSpilloverDiDAnticipationPropagation: + """anticipation is in `__init__` / `get_params` — round-5 codex review + flagged that it wasn't surfaced on the SpilloverDiDResults / to_dict() + so downstream consumers couldn't reconstruct the fitted estimand. + """ + + def test_anticipation_round_trips_to_result_and_dict(self): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=0, + ) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # anticipation is on the result object + assert result.anticipation == 0 + # anticipation is in to_dict() + d = result.to_dict() + assert "anticipation" in d + assert d["anticipation"] == 0 + + +class TestSpilloverDiDEffectiveRankDoF: + """Stage-2 residual df should use effective rank (after solve_ols drops + rank-deficient columns), not raw column count. Round-5 codex review + noted that using raw `X_2_fit.shape[1]` understates df_resid on + rank-deficient stage-2 fits and silently inflates p-values / CI widths. + """ + + def test_rank_deficient_design_uses_effective_rank(self): + # Construct a panel where the INNER ring [0, 50) has no controls + # so its stage-2 column is identically zero and solve_ols drops + # it. After the fix, df_resid uses the effective rank (k=2: + # treatment + outer ring [50, 200]), not the raw 3-column count. + rng = np.random.default_rng(42) + rows = [] + # 8 treated near origin + for i in range(8): + lat, lon = rng.normal(0, 0.005), rng.normal(0, 0.005) + for t in (0, 1): + rows.append( + { + "unit": f"T{i}", + "time": t, + "lat": lat, + "lon": lon, + "D": int(t == 1), + "y": rng.normal(), + } + ) + # 20 NEAR-controls in the OUTER ring [50, 200) only — at ~1.2° + # ≈ 133 km from origin (inside outer ring, outside inner ring). + for i in range(20): + lat, lon = 1.2 + rng.normal(0, 0.005), rng.normal(0, 0.005) + for t in (0, 1): + rows.append( + { + "unit": f"N{i}", + "time": t, + "lat": lat, + "lon": lon, + "D": 0, + "y": rng.normal(), + } + ) + # 20 far-controls beyond d_bar=200 → identify the counterfactual. + for i in range(20): + lat, lon = 5.0 + rng.normal(0, 0.005), rng.normal(0, 0.005) + for t in (0, 1): + rows.append( + { + "unit": f"F{i}", + "time": t, + "lat": lat, + "lon": lon, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 50.0, 200.0], conley_coords=("lat", "lon")) + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # Sanity: outer-ring coef + ATT both finite. + assert np.isfinite(result.att) + assert np.isfinite(result.spillover_effects.loc["[50, 200]", "coef"]) + # The empty inner ring's coef should be NaN (dropped by solve_ols). + assert np.isnan(result.spillover_effects.loc["[0, 50)", "coef"]) + + +# ============================================================================= +# Codex review round-6 regression tests +# ============================================================================= + + +class TestSpilloverDiDStringCodedTimeOnTreatmentPath: + """The `treatment=` path must coerce `time` to numeric BEFORE running + `_convert_treatment_to_first_treat`. Round-6 codex review noted that + string-coded numeric periods like ['0', '2', '10'] would sort + lexicographically ('10' < '2') and produce the wrong onset. + """ + + def test_string_coded_time_treatment_path(self): + # Unit u1: treated starting at time "2" (the SECOND period when + # sorted numerically). Lexicographic sort would mis-order "10" < "2" + # and assign first_treat = "10" (the alphabetic min of treated rows). + rng = np.random.default_rng(42) + rows = [] + # u1: time periods "0", "2", "10" — treated at "2" and "10" + for t_str, t_num in [("0", 0), ("2", 2), ("10", 10)]: + rows.append( + { + "unit": "u1", + "time": t_str, + "lat": 0.0, + "lon": 0.0, + "D": 1 if t_num >= 2 else 0, + "y": rng.normal(), + } + ) + # u2: far-away never-treated + for t_str in ("0", "2", "10"): + rows.append( + { + "unit": "u2", + "time": t_str, + "lat": 5.0, + "lon": 0.0, + "D": 0, + "y": rng.normal(), + } + ) + df = pd.DataFrame(rows) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + # The bug previously was: lex sort would treat "10" as the smallest + # treated time, assigning u1 first_treat = "10" / 10.0 — outside + # the relevant comparison range. With the fix, first_treat = 2 + # (numeric), so D_it = 1 at the rows with time = "2" AND "10" → 2 + # treated rows. + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # PROOF that time was coerced to numeric BEFORE onset detection: + # u1 has D=1 at numeric time=2 and time=10 (sorted numerically); + # n_treated must therefore be 2. Under the lex-sort bug, u1's onset + # would be "10" (since "10" < "2" lexicographically) and only the + # t="10" row would be flagged → n_treated = 1. + assert result.n_treated == 2, ( + f"string-coded numeric time not coerced before onset detection: " + f"expected n_treated=2 (numeric sort, t in {{2, 10}}), got " + f"{result.n_treated}" + ) + + +class TestSpilloverDiDOutcomeColumnRequired: + """`outcome` should fail front-door with a ValueError, not late KeyError. + Round-6 codex review noted the validator skipped outcome. + """ + + def test_missing_outcome_column_raises_value_error(self): + df = _make_butts_2period_dgp(seed=42).drop(columns=["y"]) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="Missing required columns"): + est.fit( + df, + outcome="missing_outcome", + unit="unit", + time="time", + treatment="D", + ) + + +# ============================================================================= +# Codex review round-7 regression tests +# ============================================================================= + + +class TestSpilloverDiDPanelStructure: + """Reject duplicate (unit, time) cells AND unbalanced panels up front. + Round-7 codex review reproduced an identification failure on an + unbalanced panel where moving a far-away outcome silently shifted + ATT by ~100x. + """ + + def test_duplicate_unit_time_cell_raises(self): + # u1 has two rows at the same period — duplicate (unit, time) cell. + df = pd.DataFrame( + { + "unit": ["u1", "u1", "u1", "u2", "u2", "u2"], + "time": [0, 0, 1, 0, 1, 1], # duplicates at (u1, 0) and (u2, 1) + "lat": [0.0, 0.0, 0.0, 5.0, 5.0, 5.0], + "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0], + "D": [0, 0, 1, 0, 0, 0], + "y": [1.0, 1.0, 2.0, 0.5, 0.6, 0.6], + } + ) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="duplicate.*unit, time"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_unbalanced_panel_raises(self): + # u1 has 2 periods but u2 has only 1 → unbalanced panel. + df = pd.DataFrame( + { + "unit": ["u1", "u1", "u2"], + "time": [0, 1, 0], + "lat": [0.0, 0.0, 5.0], + "lon": [0.0, 0.0, 0.0], + "D": [0, 1, 0], + "y": [1.0, 2.0, 0.5], + } + ) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="Unbalanced panel"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +# ============================================================================= +# Codex review round-8 regression tests +# ============================================================================= + + +class TestSpilloverDiDRingsStartAtZero: + """rings[0] must equal 0; otherwise units in 0 <= d_it < rings[0] are + flagged exposed but get zero spillover regressors → silent bias. + """ + + def test_rings_starting_above_zero_raises(self): + est = SpilloverDiD(rings=[10.0, 50.0, 100.0], conley_coords=("lat", "lon")) + df = _make_butts_2period_dgp(seed=42) + with pytest.raises(ValueError, match="rings\\[0\\] must equal 0"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDHC2NotSupported: + """vcov_type='hc2' and 'hc2_bm' require per-coefficient BM/CR2 DOF + that the inline stage-2 inference doesn't provide. Round-8 codex + review caught that we'd silently return wrong p-values/CIs. + """ + + @pytest.mark.parametrize("vcov_type", ["hc2", "hc2_bm"]) + def test_hc2_paths_raise_not_implemented(self, vcov_type): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"), vcov_type=vcov_type) + with pytest.raises(NotImplementedError, match="hc2"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDRankDeficientActionValidation: + """rank_deficient_action must be one of {warn, error, silent}. + Mirrors the sibling constructor guards at two_stage.py:149 and + stacked_did.py. + """ + + @pytest.mark.parametrize("bad_value", ["raise", "ignore", "", "WARN", "Error", None]) + def test_invalid_rank_deficient_action_raises_at_init(self, bad_value): + with pytest.raises(ValueError, match="rank_deficient_action must be"): + SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + rank_deficient_action=bad_value, + ) + + @pytest.mark.parametrize("good_value", ["warn", "error", "silent"]) + def test_valid_rank_deficient_action_accepted(self, good_value): + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + rank_deficient_action=good_value, + ) + assert est.rank_deficient_action == good_value + + +class TestSpilloverDiDOmega0Connectivity: + """Round-5 CI review P1: balanced panel + per-unit/per-period Omega_0 + coverage is NECESSARY but not SUFFICIENT for stage-1 FE + identification — the Omega_0 bipartite graph must also be CONNECTED. + If it splits into K > 1 components, the iterative FE solver returns + FE only up to component-specific constants, and residualization + combines mu_i from one component with lambda_t from another, + silently corrupting tau_total / delta_j. + + The check fires on the SUPPORTED-units subgraph (after unit-level + warn-and-drop). Under the current absorbing-treatment + period-strict + + unit-warn-drop regime the disconnected case may be unreachable in + practice via a real DGP — these tests unit-test the helper directly + with synthetic (unit_codes, time_codes, omega_0_mask) arrays so the + check is exercised even if no DGP can reach it through the public + `.fit()` path. + """ + + def test_disconnected_two_components_raises(self): + """Two units in periods {0, 1}, two more in periods {2, 3}; no + unit appears in both halves. Connectivity check must fail. + """ + # 4 supported units (codes 0..3), 4 periods (codes 0..3). + # Omega_0 rows: (u0, t0), (u0, t1), (u1, t0), (u1, t1), + # (u2, t2), (u2, t3), (u3, t2), (u3, t3). + unit_codes_arr = np.array([0, 0, 1, 1, 2, 2, 3, 3, 0, 1, 2, 3, 0, 1, 2, 3]) + time_codes_arr = np.array([0, 1, 0, 1, 2, 3, 2, 3, 2, 2, 0, 0, 3, 3, 1, 1]) + omega_0_mask = np.array( + [True, True, True, True, True, True, True, True] + + [False, False, False, False, False, False, False, False] + ) + with pytest.raises( + ValueError, match="disconnected components|Stage-1 fixed effects unidentified" + ): + _check_omega_0_connectivity( + omega_0_mask=omega_0_mask, + unit_codes_arr=unit_codes_arr, + time_codes_arr=time_codes_arr, + units_in_omega_0={0, 1, 2, 3}, + n_times=4, + unit_uniques=["u0", "u1", "u2", "u3"], + ) + + def test_connected_via_bridge_unit_succeeds(self): + """Add a single bridge unit that has Omega_0 rows in all periods — + the graph becomes connected and the check must pass. + """ + unit_codes_arr = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4]) + time_codes_arr = np.array([0, 1, 0, 1, 2, 3, 2, 3, 0, 1, 2, 3]) + omega_0_mask = np.array([True] * 12) + # u4 spans all 4 periods, connecting the two halves through itself. + _check_omega_0_connectivity( + omega_0_mask=omega_0_mask, + unit_codes_arr=unit_codes_arr, + time_codes_arr=time_codes_arr, + units_in_omega_0={0, 1, 2, 3, 4}, + n_times=4, + unit_uniques=["u0", "u1", "u2", "u3", "u4"], + ) # must not raise + + def test_single_supported_unit_short_circuits(self): + """n_supp <= 1 short-circuits — no multi-component case possible.""" + unit_codes_arr = np.array([0, 0]) + time_codes_arr = np.array([0, 1]) + omega_0_mask = np.array([True, True]) + _check_omega_0_connectivity( + omega_0_mask=omega_0_mask, + unit_codes_arr=unit_codes_arr, + time_codes_arr=time_codes_arr, + units_in_omega_0={0}, + n_times=2, + unit_uniques=["u0"], + ) # must not raise + + def test_three_components_error_message_names_units(self): + """Error message should name first few units per component for + actionable debugging. + """ + # 3 units, 3 periods, 3-way disconnection: (u0, t0), (u1, t1), (u2, t2). + unit_codes_arr = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2]) + time_codes_arr = np.array([0, 1, 2, 1, 2, 0, 2, 0, 1]) + omega_0_mask = np.array([True, True, True] + [False] * 6) + with pytest.raises(ValueError) as exc_info: + _check_omega_0_connectivity( + omega_0_mask=omega_0_mask, + unit_codes_arr=unit_codes_arr, + time_codes_arr=time_codes_arr, + units_in_omega_0={0, 1, 2}, + n_times=3, + unit_uniques=["unit_A", "unit_B", "unit_C"], + ) + msg = str(exc_info.value) + assert "3 disconnected components" in msg + assert "unit_A" in msg or "unit_B" in msg or "unit_C" in msg + + def test_normal_butts_dgp_does_not_trigger(self): + """Positive case: a standard non-staggered Butts DGP must NOT + trigger the connectivity check. + """ + from tests._dgp_utils import generate_butts_nonstaggered_dgp + + df = generate_butts_nonstaggered_dgp(seed=0) + # Just verify .fit() succeeds — if connectivity check were + # over-eager, this would fail. + result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit( + df, outcome="y", unit="unit", time="time", treatment="D" + ) + assert result.att is not None + + +# ============================================================================= +# Codex review round-9 regression tests +# ============================================================================= + + +class TestSpilloverDiDAnticipationValidation: + """anticipation must be a non-negative integer. Round-9 codex review + caught that fractional / negative values silently shifted timing. + """ + + @pytest.mark.parametrize("bad_value", [-1, 0.5, 1.5, -0.1]) + def test_invalid_anticipation_raises_treatment_path(self, bad_value): + df = _make_butts_2period_dgp(seed=42) + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=bad_value, + ) + with pytest.raises(ValueError, match="anticipation"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDNonFiniteOutcomeRejected: + """outcome must be finite per-row; NaN/Inf raise a targeted ValueError.""" + + def test_nan_outcome_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df.loc[0, "y"] = np.nan + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="non-finite"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_inf_outcome_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df.loc[0, "y"] = np.inf + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="non-finite"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +# ============================================================================= +# Codex review round-10 regression tests +# ============================================================================= + + +class TestSpilloverDiDNaNTreatmentRejected: + """NaN in the binary treatment column must raise. + Round-10 codex review caught that `_convert_treatment_to_first_treat` + silently dropped NaN rows via `dropna()` before validation, then + rebuilt D_it from the inferred onset — coercing missing rows to + treated or control without warning. + """ + + @pytest.mark.parametrize( + "d_pattern", + [ + [0, 1, float("nan")], + [float("nan"), 1, 1], + [0, float("nan"), 0], + ], + ) + def test_nan_in_treatment_helper_raises(self, d_pattern): + df = pd.DataFrame( + { + "unit": ["u1"] * 3, + "time": [0, 1, 2], + "D": d_pattern, + } + ) + with pytest.raises(ValueError, match="NaN"): + _convert_treatment_to_first_treat(df, "D", "time", "unit") + + def test_nan_in_treatment_end_to_end_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + first_treated_idx = df.index[df["D"] == 1][0] + df.loc[first_treated_idx, "D"] = np.nan + # Convert column dtype to float so NaN is preserved. + df["D"] = df["D"].astype(float) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="NaN"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +# ============================================================================= +# Codex review round-11 regression tests +# ============================================================================= + + +class TestSpilloverDiDClusterNaNRejected: + """NaN in cluster column must raise. Round-11 codex review caught + that missing cluster ids silently changed SEs and overstated + n_clusters because np.unique counts NaN as its own cluster but + pandas groupby drops it from the cluster meat. + """ + + def test_numeric_nan_cluster_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df["region"] = 1.0 + df.loc[df.index[0], "region"] = np.nan + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + cluster="region", + ) + with pytest.raises(ValueError, match="cluster.*missing"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_object_nan_cluster_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df["region"] = "A" + df.loc[df.index[0], "region"] = None # object-typed NaN + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + cluster="region", + ) + with pytest.raises(ValueError, match="cluster.*missing"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDIdentifierNaNRejected: + """unit / time / first_treat columns must not contain NaN — round-11 + codex review noted these fell through to opaque numpy / pandas + errors instead of targeted ValueErrors. + """ + + def test_nan_unit_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df.loc[df.index[0], "unit"] = None + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="identifier column 'unit'"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_nan_time_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df["time"] = df["time"].astype(float) + df.loc[df.index[0], "time"] = np.nan + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="identifier column 'time'"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_nan_first_treat_raises(self): + df = _make_butts_2period_dgp(seed=42).copy() + df.loc[df.index[0], "first_treat"] = np.nan + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="first_treat.*missing"): + est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + + +# ============================================================================= +# Codex review round-12 regression tests +# ============================================================================= + + +class TestSpilloverDiDMixedRawTimeEncoding: + """Mixed encodings that collapse under pd.to_numeric must be caught + by the validator AFTER coercion. Round-12 codex review caught that + raw labels ['0', 0] (str + int) would pass duplicate-cell validation + on the raw labels then collapse to (0, 0) after pd.to_numeric, with + no warning. + """ + + def test_mixed_str_and_int_time_collapse_caught_as_duplicate(self): + # u1 has time entries '0' (str) and 0 (int). They collapse to 0 + # under pd.to_numeric → duplicate (u1, 0) cell. + df = pd.DataFrame( + { + "unit": ["u1", "u1", "u1", "u2", "u2", "u2"], + "time": ["0", 0, 1, 0, 0, 1], + "lat": [0.0, 0.0, 0.0, 5.0, 5.0, 5.0], + "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0], + "D": [0, 0, 1, 0, 0, 0], + "y": [1.0, 1.0, 2.0, 0.5, 0.6, 0.6], + } + ) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="duplicate.*unit, time"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_leading_zero_string_time_collapse_caught(self): + # '01' and 1 collapse under pd.to_numeric to (1, 1) → duplicate. + df = pd.DataFrame( + { + "unit": ["u1", "u1", "u1", "u2", "u2", "u2"], + "time": [0, "01", 1, 0, "01", 1], + "lat": [0.0, 0.0, 0.0, 5.0, 5.0, 5.0], + "lon": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0], + "D": [0, 0, 1, 0, 0, 0], + "y": [1.0, 1.0, 2.0, 0.5, 0.6, 0.6], + } + ) + est = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")) + with pytest.raises(ValueError, match="duplicate.*unit, time"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDNonStaggeredFEEquivalence: + """Round-14 codex review (P2): pin the Gardner identity empirically. + + Under non-staggered timing, two-stage Gardner with the Omega_0-restricted + stage 1 should produce the same `tau_total` as a single-stage TWFE ring + regression on the full sample with the time-varying ring covariate. This + is Butts Eqs. 4-6 / Proposition 2.3 (non-staggered identification). + + The Omega_0 restriction would BREAK Gardner identity in general (stage 1 + estimates FE on a subset, predicts onto observations outside the training + set), but on Butts-Assumption-satisfying DGPs it is empirically innocent + at floating-point precision. This test PINS that equivalence so any + methodology drift surfaces in CI rather than as a silent estimand shift. + + Codex round 14 reported wildly divergent values (seed 3: +0.0238 from + SpilloverDiD vs -0.0735 from FE) but those numbers were unreproducible — + our 20-seed sweep confirms bit-identity at atol=1e-10. + """ + + @staticmethod + def _fit_butts_single_stage_fe_ring( + df, *, outcome, unit, time, treatment, rings, lat="lat", lon="lon" + ): + """Reference: single-stage TWFE ring regression on full sample. + + Y_it = mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j} + + For non-staggered with shared onset t_treat, Ring_{it,j} = 1{d_i in + [rings_j, rings_{j+1})} * 1{t >= t_treat}. Uses library's solve_ols + for rank-deficient-safe pseudo-inverse. + """ + import math + import warnings + + from diff_diff.linalg import solve_ols + + rings = sorted(rings) + df = df.copy() + units = sorted(df[unit].unique()) + times = sorted(df[time].unique()) + unit_idx = {u: i for i, u in enumerate(units)} + time_idx = {t: i for i, t in enumerate(times)} + n = len(df) + + treated_set = set(df.loc[df[treatment] == 1, unit].unique()) + lat_map = df.groupby(unit)[lat].first().to_dict() + lon_map = df.groupby(unit)[lon].first().to_dict() + + def hav(u1, u2): + lat1, lon1 = math.radians(lat_map[u1]), math.radians(lon_map[u1]) + lat2, lon2 = math.radians(lat_map[u2]), math.radians(lon_map[u2]) + dlat = lat2 - lat1 + dlon = lon2 - lon1 + a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2 + return 2 * 6371.0 * math.asin(math.sqrt(a)) + + d_i = {} + for u in units: + if u in treated_set: + d_i[u] = 0.0 + else: + d_i[u] = min(hav(u, tt) for tt in treated_set) + + K = len(rings) - 1 + ring_of_unit = {} + for u in units: + d = d_i[u] + ring_of_unit[u] = -1 + for j in range(K): + if rings[j] <= d < rings[j + 1]: + ring_of_unit[u] = j + break + + t_treat = df.loc[df[treatment] == 1, time].min() + + n_u = len(units) + n_t = len(times) + n_reg = 1 + K + # Intercept + (n_u - 1) unit FE dummies + (n_t - 1) time FE dummies + n_reg regressors + X = np.zeros((n, 1 + (n_u - 1) + (n_t - 1) + n_reg)) + X[:, 0] = 1.0 + y = df[outcome].values.astype(float) + + for i, row in enumerate(df.itertuples(index=False)): + u = getattr(row, unit) + t = getattr(row, time) + D = getattr(row, treatment) + if unit_idx[u] > 0: + X[i, 1 + unit_idx[u] - 1] = 1.0 + if time_idx[t] > 0: + X[i, 1 + (n_u - 1) + time_idx[t] - 1] = 1.0 + X[i, 1 + (n_u - 1) + (n_t - 1) + 0] = D + ridx = ring_of_unit[u] + if ridx >= 0 and t >= t_treat: + X[i, 1 + (n_u - 1) + (n_t - 1) + 1 + ridx] = 1 - D + + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + beta, _, _ = solve_ols(X, y, vcov_type="hc1") + tau = beta[1 + (n_u - 1) + (n_t - 1) + 0] + return tau + + def test_nonstaggered_one_ring_matches_single_stage_fe_20_seeds(self): + """20-seed bit-identity sweep, 1 ring (rings=[0, 200]).""" + from tests._dgp_utils import generate_butts_nonstaggered_dgp + + diffs = [] + for seed in range(20): + df = generate_butts_nonstaggered_dgp(seed=seed, tau_total=-0.07, delta_1=-0.04) + est = SpilloverDiD(rings=[0, 200], d_bar=200.0, conley_coords=("lat", "lon")) + spill = est.fit(df, outcome="y", unit="unit", time="time", treatment="D").att + fe_tau = self._fit_butts_single_stage_fe_ring( + df, + outcome="y", + unit="unit", + time="time", + treatment="D", + rings=[0, 200], + ) + diffs.append(abs(spill - fe_tau)) + max_abs_diff = max(diffs) + assert max_abs_diff < 1e-10, ( + f"Gardner identity broken: max |SpilloverDiD - single-stage FE| " + f"= {max_abs_diff:.6e} across 20 seeds (expected < 1e-10)" + ) + + def test_nonstaggered_multi_ring_matches_single_stage_fe_10_seeds(self): + """10-seed bit-identity sweep with multi-ring spec.""" + from tests._dgp_utils import generate_butts_nonstaggered_dgp + + # DGP places near-controls in d ≤ d_bar/2 = 100km, so rings beyond 100 + # may be empty. Use rings=[0, 50, 200] which has near-controls in + # [0, 50) and possibly [50, 200). + diffs = [] + for seed in range(10): + df = generate_butts_nonstaggered_dgp(seed=seed, tau_total=-0.07, delta_1=-0.04) + est = SpilloverDiD(rings=[0, 50, 200], d_bar=200.0, conley_coords=("lat", "lon")) + spill = est.fit(df, outcome="y", unit="unit", time="time", treatment="D").att + fe_tau = self._fit_butts_single_stage_fe_ring( + df, + outcome="y", + unit="unit", + time="time", + treatment="D", + rings=[0, 50, 200], + ) + diffs.append(abs(spill - fe_tau)) + max_abs_diff = max(diffs) + assert max_abs_diff < 1e-10, ( + f"Multi-ring Gardner identity broken: max diff " + f"= {max_abs_diff:.6e} across 10 seeds (expected < 1e-10)" + ) + + +class TestSpilloverDiDCoefficientsAlignToVcov: + """Round-15 codex review (P2): `coefficients` must expose ALL stage-2 + coefficients (treatment + K ring slots), not just `ATT`, so consumers + can align names to the `(1+K)×(1+K)` `vcov` rows/cols. The vcov + columns are `["treatment", "_spillover_", ...]`; the + coefficients dict mirrors those keys plus an `"ATT"` alias for the + treatment slot (sibling-estimator convention). + """ + + def _fit_one(self, rings): + from tests._dgp_utils import generate_butts_nonstaggered_dgp + + df = generate_butts_nonstaggered_dgp(seed=42, tau_total=-0.07, delta_1=-0.04) + est = SpilloverDiD(rings=rings, d_bar=float(rings[-1]), conley_coords=("lat", "lon")) + return est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_coefficients_has_treatment_and_ring_keys(self): + result = self._fit_one([0, 50, 200]) + assert "ATT" in result.coefficients + assert "treatment" in result.coefficients + ring_keys = [k for k in result.coefficients if k.startswith("_spillover_")] + assert ( + len(ring_keys) == 2 + ), f"Expected 2 ring coefficients, got {len(ring_keys)}: {ring_keys}" + + def test_att_alias_equals_treatment_slot(self): + result = self._fit_one([0, 100]) + assert result.coefficients["ATT"] == result.coefficients["treatment"] + assert result.coefficients["ATT"] == result.att + + def test_coefficients_length_matches_vcov_dimension(self): + result = self._fit_one([0, 50, 200]) + assert result.vcov.shape == (3, 3) + stage2_keys = [k for k in result.coefficients if k != "ATT"] + assert len(stage2_keys) == result.vcov.shape[0] + + def test_ring_coefficients_match_spillover_effects_dataframe(self): + result = self._fit_one([0, 50, 200]) + for ring_label, row in result.spillover_effects.iterrows(): + key = f"_spillover_{ring_label}" + assert key in result.coefficients, f"Missing key {key} in coefficients" + assert result.coefficients[key] == row["coef"], ( + f"Drift on {ring_label}: coefficients[{key}]=" + f"{result.coefficients[key]} vs spillover_effects.coef={row['coef']}" + ) From 1a71eaf59506e2551d4ae19b7dc4ac557a1eb942 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 06:51:31 -0400 Subject: [PATCH 2/3] SpilloverDiD: add anticipation behavior tests + staggered delta_1 recovery Two test-coverage gaps surfaced on the rebased SHA: 1. `anticipation` was previously only round-tripped on results and rejected for invalid values, with no behavior-level test asserting that it actually shifts D_it / Omega_0 membership / the ring-exposure clock. A regression there could silently move rows in/out of stage 1 and change tau_total / delta_j without failing CI. 2. Staggered MC anchored only `tau_total`, not `delta_1`. A staggered- only ring-assignment or spillover-coefficient regression with roughly- unchanged tau_total wouldn't be caught. Fixes: - tests/test_spillover.py: new `TestSpilloverDiDAnticipationBehavior` class (3 tests) with a hand-built 4-period panel (1 treated @ t=2, 1 near-control, 2 far-controls). Asserts on both `treatment=` and `first_treat=` paths that anticipation=1 (vs 0) yields n_treated+1, smaller stage1_n_obs, and a different att. Plus a parity check that the two fit paths produce identical results under the same anticipation setting (entry points are internally unified). - tests/test_spillover.py: `TestSpilloverDiDIdentification::test_staggered_recovers_tau_total` renamed to `..._tau_total_and_delta_1` and extended to also assert staggered delta_1 recovery within 0.03 absolute tolerance (mean over 30 seeds: -0.0398, std 0.0083). Per-event-time `delta_jk` decomposition on staggered DGPs is still queued alongside event_study=True support. - docs/methodology/REGISTRY.md, CHANGELOG.md: narrow the staggered MC anchor language to match (tau_total + delta_1, not tau_total only); bump test count 153 -> 156. All 156 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- docs/methodology/REGISTRY.md | 2 +- tests/test_spillover.py | 152 ++++++++++++++++++++++++++++++++++- 3 files changed, 151 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b45f1ae1..07f8eab1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (153 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors `tau_total` only — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. +- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (156 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. ### Changed diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 13995e77..a177962b 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2971,7 +2971,7 @@ where `D_it` is the treatment indicator (1 if unit `i` is treated by time `t`) a **Note (anticipation shift):** The `anticipation: int` constructor kwarg (default 0) shifts BOTH the treatment and ring-membership clocks by `-anticipation` so the stage-1 Omega_0 subsample correctly excludes the `anticipation` pre-treatment periods (which would otherwise contaminate the FE estimation with anticipation effects). Concretely, the effective treatment indicator becomes `D'_it = 1{t >= first_treat - anticipation}` and ring membership uses the corresponding shifted "currently-treated" set when computing `d_it` for staggered timing. Stage 2 then estimates `tau_total` and `delta_j` net of the anticipation window. Mirrors `TwoStageDiD`'s `anticipation` parameter semantics — recommended use is a small integer (1-3 periods) when domain knowledge suggests treatment effects begin before formal onset (e.g., policy announcements). Setting `anticipation > 0` AND providing a `first_treat` column where `first_treat - anticipation < min(time)` for any unit will mark that unit as treated from the panel's first observation, with the same baseline-treated handling as if it were always-treated (warn-and-drop via the Omega_0 unit-level check above). -**Note:** No published R/Stata software exists for the two-period ring estimator (Butts Eqs. 5-6). Correctness anchored on (a) synthetic-DGP Monte Carlo identification tests on the **non-staggered** DGP (50-seed default + 200-seed `@pytest.mark.slow` variant — both recover `tau_total` AND `delta_1` within 0.02 absolute tolerance on Butts-Assumption-satisfying DGPs) plus a **staggered**-DGP MC test (30-seed, no slow variant) that anchors `tau_total` only within 0.04 absolute tolerance (looser bound because each staggered DGP is larger and noisier); per-ring `delta_jk` recovery on staggered DGPs and a 200-seed slow staggered variant are queued as follow-ups alongside `event_study=True` support; (b) reduce-to-TWFE limit (non-staggered, 20-seed deterministic bit-identity at `atol=1e-10` via `TestSpilloverDiDNonStaggeredFEEquivalence`); (c) Wave A's Conley sparse-vs-dense parity inherited via `solve_ols`. A reduce-to-`TwoStageDiD` limit was scoped during planning but not shipped in Wave B (the Omega_0 stricter subsample requires additional fixture work to align with `TwoStageDiD`'s `{D_it = 0}` subsample); queued as a follow-up alongside the Gardner GMM correction. +**Note:** No published R/Stata software exists for the two-period ring estimator (Butts Eqs. 5-6). Correctness anchored on (a) synthetic-DGP Monte Carlo identification tests on the **non-staggered** DGP (50-seed default + 200-seed `@pytest.mark.slow` variant — both recover `tau_total` AND `delta_1` within 0.02 absolute tolerance on Butts-Assumption-satisfying DGPs) plus a **staggered**-DGP MC test (30-seed, no slow variant) that anchors both `tau_total` within 0.04 and `delta_1` within 0.03 absolute tolerance (looser bounds because each staggered DGP is larger and noisier); per-event-time `delta_jk` decomposition on staggered DGPs and a 200-seed slow staggered variant are queued as follow-ups alongside `event_study=True` support; (b) reduce-to-TWFE limit (non-staggered, 20-seed deterministic bit-identity at `atol=1e-10` via `TestSpilloverDiDNonStaggeredFEEquivalence`); (c) Wave A's Conley sparse-vs-dense parity inherited via `solve_ols`. A reduce-to-`TwoStageDiD` limit was scoped during planning but not shipped in Wave B (the Omega_0 stricter subsample requires additional fixture work to align with `TwoStageDiD`'s `{D_it = 0}` subsample); queued as a follow-up alongside the Gardner GMM correction. **Assumptions (Butts 2021):** diff --git a/tests/test_spillover.py b/tests/test_spillover.py index c3892942..42ed7f5f 100644 --- a/tests/test_spillover.py +++ b/tests/test_spillover.py @@ -810,20 +810,33 @@ def test_nonstaggered_recovers_at_200_seeds(self): assert abs(np.mean(att_estimates) - (-0.07)) < 0.02 assert abs(np.mean(delta_estimates) - (-0.04)) < 0.02 - def test_staggered_recovers_tau_total(self): - """Staggered MC with 30 seeds (smaller because each DGP is larger).""" + def test_staggered_recovers_tau_total_and_delta_1(self): + """Staggered MC with 30 seeds (smaller because each DGP is larger). + + Anchors BOTH `tau_total` and `delta_1` recovery on the staggered + DGP. Per-ring `delta_jk` (event-time decomposition) is deferred + alongside `event_study=True` support. + """ att_estimates = [] + delta_estimates = [] for s in range(30): df = generate_butts_staggered_dgp(tau_total=-0.07, delta_1=-0.04, seed=s) result = SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon")).fit( df, outcome="y", unit="unit", time="time", first_treat="first_treat" ) att_estimates.append(result.att) + if result.spillover_effects is not None and len(result.spillover_effects) > 0: + delta_estimates.append(result.spillover_effects.iloc[0]["coef"]) mean_att = float(np.mean(att_estimates)) - # Staggered MC is noisier; allow a looser tolerance. + mean_delta = float(np.mean(delta_estimates)) + # Staggered MC is noisier than non-staggered; allow a looser + # tolerance (0.04 on tau_total, 0.03 on delta_1). assert ( abs(mean_att - (-0.07)) < 0.04 ), f"staggered tau_total: expected -0.07, got {mean_att:.4f}" + assert ( + abs(mean_delta - (-0.04)) < 0.03 + ), f"staggered delta_1: expected -0.04, got {mean_delta:.4f}" # ============================================================================= @@ -1855,6 +1868,139 @@ def test_anticipation_round_trips_to_result_and_dict(self): assert d["anticipation"] == 0 +class TestSpilloverDiDAnticipationBehavior: + """Round-7 CI review P1: anticipation must change the fitted estimand, + not just round-trip through the result object. It shifts BOTH the + treatment indicator and the ring-exposure clock by `-anticipation`, + moving rows in and out of Omega_0 and changing `tau_total` / `delta_j`. + Hand-built 4-period panel with one treated unit at t=2: anticipation=1 + promotes t=1 into the "treated" window, dropping that row from + Omega_0 and increasing n_treated by one period's worth of obs. + Verified on both fit entry paths (`treatment=` and `first_treat=`). + """ + + @staticmethod + def _make_4period_panel(): + rng = np.random.default_rng(42) + # 1 treated @ t=2, 1 near-control, 2 far-controls. 4 periods. + specs = [ + ("treated", 0.0, [0, 0, 1, 1]), + ("near", 0.5, [0, 0, 0, 0]), + ("far1", 5.0, [0, 0, 0, 0]), + ("far2", 5.1, [0, 0, 0, 0]), + ] + rows = [] + for unit, lat, d_pattern in specs: + for t in range(4): + rows.append( + { + "unit": unit, + "time": t, + "lat": lat, + "lon": 0.0, + "D": d_pattern[t], + "y": rng.normal(), + } + ) + return pd.DataFrame(rows) + + def test_anticipation_shifts_omega_0_on_treatment_path(self): + """anticipation=1 on the binary `treatment=` path: the effective + treatment indicator slides one period earlier, so t=1 (formerly + Omega_0 for treated + near units) is dropped from stage 1 and + promoted into the "currently-treated / currently-exposed" zone. + Stage 1 sample shrinks; n_treated grows; att changes.""" + df = self._make_4period_panel() + r0 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=0, + ).fit(df, outcome="y", unit="unit", time="time", treatment="D") + r1 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=1, + ).fit(df, outcome="y", unit="unit", time="time", treatment="D") + # n_treated grows by 1 row (treated unit's t=1 row promoted under + # the shifted indicator). + assert r1.n_treated == r0.n_treated + 1, ( + f"anticipation=1 should add one treated period; " + f"got n_treated {r0.n_treated} -> {r1.n_treated}" + ) + # Stage 1 Omega_0 sample shrinks (rows promoted to treated/exposed + # leave Omega_0). + assert r1.stage1_n_obs < r0.stage1_n_obs, ( + f"anticipation=1 should shrink Omega_0; got stage1_n_obs " + f"{r0.stage1_n_obs} -> {r1.stage1_n_obs}" + ) + # And the estimand changes (different sample → different att). + assert r0.att != r1.att, f"anticipation=1 should change att; got {r0.att} == {r1.att}" + + def test_anticipation_shifts_omega_0_on_first_treat_path(self): + """anticipation=1 on the Gardner `first_treat=` path: same shift + applies. The first_treat column carries treatment onsets directly, + and anticipation subtracts from each onset for both the D_it + construction AND the ring-exposure (S_it) clock.""" + df = self._make_4period_panel() + # Convert binary D to first_treat column. + first_treat_map = {"treated": 2.0, "near": np.inf, "far1": np.inf, "far2": np.inf} + df_ft = df.copy() + df_ft["first_treat"] = df_ft["unit"].map(first_treat_map) + df_ft = df_ft.drop(columns=["D"]) + + r0 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=0, + ).fit(df_ft, outcome="y", unit="unit", time="time", first_treat="first_treat") + r1 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=1, + ).fit(df_ft, outcome="y", unit="unit", time="time", first_treat="first_treat") + assert r1.n_treated == r0.n_treated + 1, ( + f"first_treat path anticipation=1: expected +1 treated, " + f"got {r0.n_treated} -> {r1.n_treated}" + ) + assert r1.stage1_n_obs < r0.stage1_n_obs, ( + f"first_treat path anticipation=1: Omega_0 should shrink, " + f"got {r0.stage1_n_obs} -> {r1.stage1_n_obs}" + ) + assert r0.att != r1.att, ( + f"first_treat path anticipation=1: expected att to change, " f"got {r0.att} == {r1.att}" + ) + + def test_anticipation_shift_matches_across_fit_paths(self): + """Sanity-check that the `treatment=` and `first_treat=` paths + produce identical results under the same anticipation setting — + the two entry points are internally unified, so anticipation must + compose consistently with both.""" + df = self._make_4period_panel() + df_ft = df.copy() + df_ft["first_treat"] = df_ft["unit"].map( + {"treated": 2.0, "near": np.inf, "far1": np.inf, "far2": np.inf} + ) + df_ft = df_ft.drop(columns=["D"]) + + for ant in (0, 1): + r_d = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=ant, + ).fit(df, outcome="y", unit="unit", time="time", treatment="D") + r_ft = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + anticipation=ant, + ).fit(df_ft, outcome="y", unit="unit", time="time", first_treat="first_treat") + assert r_d.att == r_ft.att, ( + f"anticipation={ant}: att mismatch between paths " f"({r_d.att} vs {r_ft.att})" + ) + assert ( + r_d.stage1_n_obs == r_ft.stage1_n_obs + ), f"anticipation={ant}: stage1_n_obs mismatch between paths" + + class TestSpilloverDiDEffectiveRankDoF: """Stage-2 residual df should use effective rank (after solve_ols drops rank-deficient columns), not raw column count. Round-5 codex review From 2b86d8c051ecec73c43ae601e601fc213bec3b93 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 07:05:53 -0400 Subject: [PATCH 3/3] SpilloverDiD: replace Conley smoke test with plumbing-verification MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round-8 CI review flagged `test_conley_se_differs_from_hc1` as a test- coverage gap: the name promises a Conley-specific assertion but the body only checked finite SE + ATT invariance. A silent fallback to HC1 (e.g., if SpilloverDiD ever stopped threading the Conley kwargs through to `solve_ols`) would have passed. Replace with: - `test_conley_kwargs_threaded_to_solve_ols`: patches `solve_ols` at the import site, captures the kwargs of the stage-2 invocation, and asserts they include `vcov_type="conley"`, `conley_cutoff_km=200.0`, `conley_metric="haversine"`, `conley_lag_cutoff=0`, plus fit-time-derived `conley_coords` / `conley_time` / `conley_unit` arrays of the right shape. Any silent fallback to HC1 fails this. - `test_conley_att_invariant_vs_hc1`: extracted from the old test — vcov choice does not change ATT (residualization + OLS fit are independent of variance). Now stands as a clean invariant rather than pretending to verify Conley-specific output. Also bumps CHANGELOG test count 156 -> 157 and updates the Conley description to "plumbing (verifies solve_ols is called with vcov_type= 'conley' + Conley kwargs, no silent HC1 fallback)". All 157 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- tests/test_spillover.py | 80 +++++++++++++++++++++++++++++++++-------- 2 files changed, 67 insertions(+), 15 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 07f8eab1..ad5135c2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (156 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. +- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. ### Changed diff --git a/tests/test_spillover.py b/tests/test_spillover.py index 42ed7f5f..17cd5238 100644 --- a/tests/test_spillover.py +++ b/tests/test_spillover.py @@ -866,32 +866,84 @@ def test_conley_fit_runs(self): assert result.conley_lag_cutoff == 0 assert np.isfinite(result.se) - def test_conley_se_differs_from_hc1(self): - """Conley SE differs from HC1 baseline (spatial correlation in errors).""" + def test_conley_kwargs_threaded_to_solve_ols(self): + """Round-8 CI review P3 (test coverage gap): the previous test was a + smoke test that only asserted finite SE + ATT invariance — a silent + fallback to HC1 would have passed. This test plumbing-verifies that + `solve_ols` is actually called with `vcov_type="conley"` AND the + Conley-specific kwargs (`conley_coords`, `conley_cutoff_km`, + `conley_metric`, `conley_time`, `conley_unit`, `conley_lag_cutoff`). + """ + from unittest.mock import patch + df = generate_butts_nonstaggered_dgp( seed=42, n_treated=20, n_near_control=80, n_far_control=100 ) - est_hc1 = SpilloverDiD( + est = SpilloverDiD( rings=[0.0, 100.0], conley_coords=("lat", "lon"), - vcov_type="hc1", + vcov_type="conley", + conley_cutoff_km=200.0, + conley_metric="haversine", + conley_lag_cutoff=0, ) - result_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D") - est_conley = SpilloverDiD( + # Patch solve_ols at the import site in spillover.py so we can + # observe the kwargs SpilloverDiD passes through at stage 2. + import diff_diff.spillover as spillover_mod + + captured: dict = {} + + original_solve_ols = spillover_mod.solve_ols + + def spy_solve_ols(*args, **kwargs): + # Capture the LAST call's kwargs (stage 2 is the last solve_ols + # invocation in fit()). + captured.clear() + captured.update(kwargs) + return original_solve_ols(*args, **kwargs) + + with patch.object(spillover_mod, "solve_ols", side_effect=spy_solve_ols): + result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + # Conley kwargs reached solve_ols (no silent HC1 fallback). + assert ( + captured.get("vcov_type") == "conley" + ), f"expected solve_ols vcov_type='conley', got {captured.get('vcov_type')!r}" + assert captured.get("conley_cutoff_km") == 200.0 + assert captured.get("conley_metric") == "haversine" + assert captured.get("conley_lag_cutoff") == 0 + # The fit-time-derived spatial / temporal arrays must be present and + # have the right shape. + coords = captured.get("conley_coords") + assert coords is not None and coords.shape == (result.n_obs, 2) + conley_time = captured.get("conley_time") + conley_unit = captured.get("conley_unit") + assert conley_time is not None and len(conley_time) == result.n_obs + assert conley_unit is not None and len(conley_unit) == result.n_obs + # And the reported SE is finite (the actual Conley computation + # completed end-to-end). + assert np.isfinite(result.se) + + def test_conley_att_invariant_vs_hc1(self): + """Point-estimate invariance: vcov choice does not change ATT + (the residualization + OLS fit are independent of variance). + """ + df = generate_butts_nonstaggered_dgp( + seed=42, n_treated=20, n_near_control=80, n_far_control=100 + ) + result_hc1 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + vcov_type="hc1", + ).fit(df, outcome="y", unit="unit", time="time", treatment="D") + result_conley = SpilloverDiD( rings=[0.0, 100.0], conley_coords=("lat", "lon"), vcov_type="conley", conley_cutoff_km=200.0, conley_lag_cutoff=0, - ) - result_conley = est_conley.fit(df, outcome="y", unit="unit", time="time", treatment="D") - # ATT point estimate unchanged across vcov; SE may differ. + ).fit(df, outcome="y", unit="unit", time="time", treatment="D") assert abs(result_hc1.att - result_conley.att) < 1e-10 - # Conley SE may be larger or smaller than HC1 depending on spatial - # error correlation; just assert it's not identical. - # (Synthetic DGP has independent errors so they may be very close; - # use a loose tolerance — primarily a wiring test.) - assert np.isfinite(result_conley.se) # =============================================================================