SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446
SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446igerber wants to merge 6 commits into
Conversation
New standalone estimator at `diff_diff/spillover.py` implementing
two-stage Gardner (2022) DiD with ring-indicator covariates that
identify, alongside the direct effect on treated (`tau_total`), per-ring
spillover effects on near-control units (`delta_j`). Reference: Butts, K.
(2023, originally 2021) "Difference-in-Differences with Spatial
Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage
differences in differences" arXiv:2207.05943.
Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered
timing in one estimator — non-staggered is the special case where all
treated units share an onset time.
Methodology
-----------
- Stage-2 regressor: time-varying `(1 - D_it) * Ring_{it,j}` (paper
page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's
`S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 -
D_it) * S_i` from Equation 5 is rank-deficient under TWFE; only the
time-varying form supports the paper's identification (Prop 2.3).
- Stage-1 subsample: Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}`
(untreated AND unexposed) — not TwoStageDiD's `{D_it = 0}` — prevents
spillover-contaminated near-controls from biasing the time FE.
- Gardner identity (non-staggered): empirically bit-identical to direct
single-stage TWFE ring regression on the full sample at atol=1e-10
(20-seed deterministic regression test). The reported non-staggered
`tau_total` IS the Butts Eqs. 4-6 estimator.
API
---
SpilloverDiD(
rings=[0, 50, 100, 200],
conley_coords=("lat", "lon"),
vcov_type="conley", # or "hc1" / cluster
conley_cutoff_km=200.0,
conley_lag_cutoff=0,
).fit(data, outcome="y", unit="unit", time="t", treatment="D")
Binary `D` auto-converts to a Gardner `first_treat` column; users with
canonical staggered data can pass `first_treat=` directly. Result is
`SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`,
`.spillover_effects` (per-ring DataFrame with coef/se/t_stat/p_value/CI),
`.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`,
`.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all
`(1+K)` stage-2 entries keyed to vcov columns plus an `"ATT"` alias.
Identification-check policy
---------------------------
- Period level (structural): every period must have at least one Omega_0
row, else time FE for that period is unidentified — hard ValueError.
- Unit level (recoverable): units lacking Omega_0 rows (e.g. baseline-
treated units with `D_it = 1` at all observed `t`) are warned-and-
dropped; their unit FE is NaN, residualization writes NaN on their
rows, and the downstream finite-mask path excludes them from stage 2.
Mirrors TwoStageDiD's always-treated convention.
Variance (Wave B MVP)
---------------------
Stage-2 OLS variance via `solve_ols` — HC1, Conley spatial-HAC, and
cluster-robust paths all flow through. The Gardner GMM first-stage
uncertainty correction is NOT applied at stage 2 in this PR (documented
limitation; planned follow-up extends `two_stage.py::_compute_gmm_
variance` to accept a Conley kernel matrix in place of HC1's identity at
the influence-function outer-product step). Reported SEs are conservative
relative to the full GMM + Conley sandwich.
Deferred features (planned follow-ups)
--------------------------------------
- `event_study=True` per-event-time × ring coefficients (Butts Table 2)
- `survey_design=` integration
- `ring_method="count"` (count-of-treated-in-ring)
- Data-driven `d_bar` selection (Butts 2021b / 2023 JUE Insight)
- Gardner GMM first-stage correction at stage 2
- Sparse staggered ring-distance path
- TwoStageDiD / SpilloverDiD shared-internals refactor
Tests
-----
139 tests at `tests/test_spillover.py` across ring-construction
primitives, validators, fit integration, raw-data invariant,
identification MC (50-seed default + 200-seed `@pytest.mark.slow`
variant), Conley wiring, Gardner identity bit-identity (20-seed
deterministic regression test against direct single-stage TWFE ring
regression), coefficients-vs-vcov column alignment, and Omega_0 warn-
and-drop. DGP factories at `tests/_dgp_utils.py::generate_butts_
nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts
Assumptions 1/3/5/7 by construction.
Documentation
-------------
- `docs/methodology/REGISTRY.md` — new SpilloverDiD section adjacent to
ConleySpatialHAC with the methodology spec, edge-case table, and
documented deviations.
- `docs/api/spillover.rst` — API reference with Wave B MVP limitations.
- `diff_diff/guides/llms.txt` + `llms-full.txt` — agent-facing catalog
entries.
- `README.md` — one-line catalog entry under `## Estimators`.
- `docs/references.rst` — Butts (2021/2023) + Gardner (2022) citations.
- `docs/doc-deps.yaml` — `diff_diff/spillover.py` →
`[REGISTRY.md#spillover, docs/api/spillover.rst]`.
- `TODO.md` — deferred-features rows under "Tech Debt from Code
Reviews" for the planned follow-ups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
- Clamp `max(vcov[i, i], 0.0)` before sqrt for ATT and per-ring SE extraction (spillover.py:L1740-L1762). Matches the sibling-estimator convention at two_stage.py:1183, estimators.py:606, stacked_did.py:515. Prevents numerically tiny negative diagonals from indefinite Conley sandwiches or near-singular cases from NaN-ing the full inference row. - Hoist row_pos out of the per-cohort loop in _compute_nearest_treated_distance_staggered (spillover.py:L400-L425). row_pos depends only on row_unit and unit_to_pos, both invariant across the cohort iteration; one O(n_rows) array build instead of O(n_rows × n_cohorts) on dense staggered fits. - Add TODO.md row tracking the sparse cKDTree path for the staggered helper as Wave B follow-up. Resolves the stale code-comment reference in spillover.py:L365-L369. 139 tests pass; no behavior change on existing fixtures (the clamp is defensive against unrealizable values; the hoist is a refactor; the TODO is bookkeeping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology No unmitigated source-material mismatch found. The time-varying ring regressor, stricter
Code Quality
Performance No findings. The remaining spillover-specific hotspot, the dense staggered nearest-treated-distance path, is now properly tracked in TODO.md:L128-L128. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
…_action guard
- REGISTRY: add "Note (anticipation shift)" documenting the public
`anticipation: int` parameter's effect on both treatment and ring-
membership clocks. Matches the implementation semantics at
spillover.py:850-854 + 1441-1544, and mirrors TwoStageDiD's
anticipation parameter convention.
- REGISTRY: narrow the "correctness anchored on reduce-to-TWFE /
reduce-to-TwoStageDiD limits" claim. Only the reduce-to-TWFE limit
shipped (the 20-seed Gardner identity bit-identity test at
TestSpilloverDiDNonStaggeredFEEquivalence). The reduce-to-TwoStageDiD
limit was scoped during planning but not shipped — the Omega_0
stricter subsample requires fixture work to align with TwoStageDiD's
{D_it = 0} subsample. Queued as a follow-up.
- spillover.py: add `rank_deficient_action` constructor guard mirroring
two_stage.py:149-153 and stacked_did.py. Bad values now fail at
__init__ with a clear ValueError instead of deep inside solve_ols.
- tests: new TestSpilloverDiDRankDeficientActionValidation class
exercising 6 invalid values + 3 valid values.
148 tests pass (was 139); black + ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…tion - REGISTRY: narrow the MC anchor claim to reflect what's actually shipped — non-staggered DGP tests pin both tau_total AND delta_1, staggered DGP tests pin tau_total only; per-ring delta_jk recovery on staggered DGPs queued alongside event_study=True support. - llms-full.txt: add the missing `covariates= raises NotImplementedError` limitation bullet so the documented surface matches the runtime rejection (the fit signature exposes covariates= but the estimator rejects it with a clear explanation about Omega_0 stage-1 fitting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
…d reality The round-3 narrowing left the "50-seed default + 200-seed slow variant" phrasing applying to both non-staggered AND staggered MC tests. Only non-staggered actually has the 50/200 structure (and recovers both tau_total and delta_1); staggered is a single 30-seed test that anchors tau_total only with a looser 0.04 tolerance (each staggered DGP is larger and noisier). Honest end-state for the docs. Also bumps the CHANGELOG test count 139 -> 148 to reflect the rank_deficient_action validation class added in the previous polish commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality No additional findings. Performance No additional findings. Maintainability No additional findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…pth) Stage-1's iterative FE solver identifies (mu_i, lambda_t) only up to component-specific constants per connected component of the bipartite graph (supported units linked by shared untreated-and-unexposed periods). If the graph splits into K > 1 components, residualization combines mu_i from one component with lambda_t from another, silently corrupting y_tilde and downstream tau_total / delta_j. Balanced panel + per-unit / per-period Omega_0 coverage is NECESSARY but not SUFFICIENT; connectivity is the load-bearing identification condition. Under the current absorbing-treatment + period-strict + unit-warn-drop regime the disconnected case appears unreachable through `.fit()` (the combination of validators forces a connected supported subgraph), but the check is defense-in-depth and future-proofs the Wave B follow-ups (event_study, survey_design integration, possible reversible-treatment relaxations). - diff_diff/spillover.py: new `_check_omega_0_connectivity` helper using scipy.sparse.csgraph.connected_components; called immediately after the unit-level warn-and-drop block (Step 10c). Operates on the SUPPORTED-units subgraph (warn-dropped units are excluded so they don't form trivial singletons). - tests/test_spillover.py: TestSpilloverDiDOmega0Connectivity (5 tests) unit-tests the helper directly with synthetic (unit_codes, time_codes, omega_0_mask) arrays — disconnected 2-component case raises; bridge- unit-connected case succeeds; n_supp <= 1 short-circuits; 3-component error names units; normal Butts DGP through `.fit()` doesn't trigger the check. - docs/methodology/REGISTRY.md, docs/api/spillover.rst, CHANGELOG.md: extend the Omega_0 identification-policy note to include connectivity. 153 tests pass (was 148); black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — the prior Executive Summary
Methodology
Code Quality No findings. Performance
Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests No unmitigated findings. The public docs now describe the connectivity requirement and the period-strict/unit-warn-drop behavior, and the new tests cover the connectivity helper directly ( |
Summary
SpilloverDiD(diff_diff/spillover.py) implementing two-stage Gardner DiD with ring-indicator covariates that identify both the direct effect on treated units (tau_total) and per-ring spillover effects on near-control units (delta_j). Handles non-staggered and Section 5 staggered timing in a single estimator.Methodology references
(1 - D_it) * Ring_{it,j}form (paper page 12'sS_it = S_i * 1{t >= t_treat}notation; Section 5 Table 2'sS^k_{it}/Ring^k_{it,j}). The literal unit-static reading of Equation 5 is algebraically rank-deficient under TWFE; only the time-varying form supports the paper's identification (Proposition 2.3). Documented indocs/methodology/REGISTRY.md§ SpilloverDiD.Omega_0 = {D_it = 0 AND S_it = 0}(untreated AND unexposed) rather thanTwoStageDiD's{D_it = 0}(untreated only). Prevents spillover-contaminated near-controls from biasing the time FE.ValueError— dropping a period removes all units' cross-time identification), unit warn-and-drop (mirrorsTwoStageDiD's always-treated convention; the downstream finite-mask path excludes the affected rows from stage 2).two_stage.py::_compute_gmm_variance). Documented in REGISTRY +TODO.md.D_it = 0) rather than never-treated-only, so all-eventually-treated staggered designs can identify the counterfactual via not-yet-treated far-away rows.did2simplements Gardner two-stage without rings; no published R/Stata software implements the Butts ring estimator. Correctness anchored on (a) 20-seed deterministic regression test pinningSpilloverDiD.attagainst direct single-stage TWFE ring regression atatol=1e-10(the Gardner identity equivalence for non-staggered timing — empirically bit-identical, so the reported non-staggeredtau_totalIS the Butts Eqs. 4-6 estimator), (b) 50-seed Monte Carlo identification recovery on synthetic Butts-Assumption-satisfying DGPs (+ 200-seed@pytest.mark.slowvariant), and (c) Conley sparse-vs-dense parity inherited from the 3.3.3 release.Validation
tests/test_spillover.py):{0,1}treatment, NaN rejection on cluster/unit/time/first_treat/treatment, balanced panel, duplicate cells, non-absorbing treatment, conley_coords within-unit-constant, callable metric self-distance contract,hc2/hc2_bmrejected, NaN in outcome rejected, mixed-encoding time collapse caught)tau_totalanddelta_j; 200-seed slow variant)atol=1e-10)tests/_dgp_utils.py):generate_butts_nonstaggered_dgp/generate_butts_staggered_dgpsatisfy Butts Assumptions 1/3/5/7 by construction.docs/api/spillover.rst,diff_diff/guides/llms.txt+llms-full.txt,docs/references.rst,docs/doc-deps.yaml, README catalog entry,TODO.mdrows for deferred follow-ups.Security / privacy
Generated with Claude Code