SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021) by igerber · Pull Request #446 · igerber/diff-diff

igerber · 2026-05-15T21:21:28Z

Summary

New standalone estimator SpilloverDiD (diff_diff/spillover.py) implementing two-stage Gardner DiD with ring-indicator covariates that identify both the direct effect on treated units (tau_total) and per-ring spillover effects on near-control units (delta_j). Handles non-staggered and Section 5 staggered timing in a single estimator.

Methodology references

Method: Butts (2021/2023) ring-indicator spillover-aware DiD; Gardner (2022) two-stage residualize-then-fit; reuses the Conley (1999) spatial-HAC vcov shipped in 3.3.3.
Paper / source links:
- Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. arXiv:2105.03737v3 — https://arxiv.org/abs/2105.03737
- Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943 — https://arxiv.org/abs/2207.05943
- Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. Journal of Econometrics 92(1), 1-45.
Intentional deviations from source:
- Stage-2 regressor uses the time-varying (1 - D_it) * Ring_{it,j} form (paper page 12's S_it = S_i * 1{t >= t_treat} notation; Section 5 Table 2's S^k_{it} / Ring^k_{it,j}). The literal unit-static reading of Equation 5 is algebraically rank-deficient under TWFE; only the time-varying form supports the paper's identification (Proposition 2.3). Documented in docs/methodology/REGISTRY.md § SpilloverDiD.
- Stage-1 subsample uses Butts' stricter Omega_0 = {D_it = 0 AND S_it = 0} (untreated AND unexposed) rather than TwoStageDiD's {D_it = 0} (untreated only). Prevents spillover-contaminated near-controls from biasing the time FE.
- Omega_0 row-level identification check: period strict (hard ValueError — dropping a period removes all units' cross-time identification), unit warn-and-drop (mirrors TwoStageDiD's always-treated convention; the downstream finite-mask path excludes the affected rows from stage 2).
- Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (planned follow-up extends two_stage.py::_compute_gmm_variance). Documented in REGISTRY + TODO.md.
- Far-away control identification uses current-period untreated status (D_it = 0) rather than never-treated-only, so all-eventually-treated staggered designs can identify the counterfactual via not-yet-treated far-away rows.
No R parity benchmark: did2s implements Gardner two-stage without rings; no published R/Stata software implements the Butts ring estimator. Correctness anchored on (a) 20-seed deterministic regression test pinning SpilloverDiD.att against direct single-stage TWFE ring regression at atol=1e-10 (the Gardner identity equivalence for non-staggered timing — empirically bit-identical, so the reported non-staggered tau_total IS the Butts Eqs. 4-6 estimator), (b) 50-seed Monte Carlo identification recovery on synthetic Butts-Assumption-satisfying DGPs (+ 200-seed @pytest.mark.slow variant), and (c) Conley sparse-vs-dense parity inherited from the 3.3.3 release.

Validation

Tests added (139 total in tests/test_spillover.py):
- Ring-construction primitives (pairwise haversine/euclidean, callable metrics, static + staggered nearest-treated distance, sparse kd-tree path)
- Validators (rings sort/non-negative/start-at-0, d_bar consistency, treatment XOR first_treat, exact {0,1} treatment, NaN rejection on cluster/unit/time/first_treat/treatment, balanced panel, duplicate cells, non-absorbing treatment, conley_coords within-unit-constant, callable metric self-distance contract, hc2/hc2_bm rejected, NaN in outcome rejected, mixed-encoding time collapse caught)
- Fit integration (non-staggered + staggered, raw-data invariant, Conley wiring, baseline-treated unit recognition + warn-and-drop, partial-unsupported-units warn-and-drop, period-level hard ValueError)
- Identification (50-seed MC recovery of tau_total and delta_j; 200-seed slow variant)
- Gardner identity (20-seed bit-identity vs single-stage TWFE ring regression at atol=1e-10)
- Coefficients-vs-vcov column alignment + ATT alias
DGP factories (tests/_dgp_utils.py): generate_butts_nonstaggered_dgp / generate_butts_staggered_dgp satisfy Butts Assumptions 1/3/5/7 by construction.
No tutorial in this PR: T22 (Butts TVA tutorial) is deferred to a follow-up notebook PR per user-confirmed Wave B scope.
Documentation: REGISTRY section adjacent to ConleySpatialHAC, docs/api/spillover.rst, diff_diff/guides/llms.txt + llms-full.txt, docs/references.rst, docs/doc-deps.yaml, README catalog entry, TODO.md rows for deferred follow-ups.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner (2022) DiD with ring-indicator covariates that identify, alongside the direct effect on treated (`tau_total`), per-ring spillover effects on near-control units (`delta_j`). Reference: Butts, K. (2023, originally 2021) "Difference-in-Differences with Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage differences in differences" arXiv:2207.05943. Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. Methodology ----------- - Stage-2 regressor: time-varying `(1 - D_it) * Ring_{it,j}` (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is rank-deficient under TWFE; only the time-varying form supports the paper's identification (Prop 2.3). - Stage-1 subsample: Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed) — not TwoStageDiD's `{D_it = 0}` — prevents spillover-contaminated near-controls from biasing the time FE. - Gardner identity (non-staggered): empirically bit-identical to direct single-stage TWFE ring regression on the full sample at atol=1e-10 (20-seed deterministic regression test). The reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. API --- SpilloverDiD( rings=[0, 50, 100, 200], conley_coords=("lat", "lon"), vcov_type="conley", # or "hc1" / cluster conley_cutoff_km=200.0, conley_lag_cutoff=0, ).fit(data, outcome="y", unit="unit", time="t", treatment="D") Binary `D` auto-converts to a Gardner `first_treat` column; users with canonical staggered data can pass `first_treat=` directly. Result is `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring DataFrame with coef/se/t_stat/p_value/CI), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries keyed to vcov columns plus an `"ATT"` alias. Identification-check policy --------------------------- - Period level (structural): every period must have at least one Omega_0 row, else time FE for that period is unidentified — hard ValueError. - Unit level (recoverable): units lacking Omega_0 rows (e.g. baseline- treated units with `D_it = 1` at all observed `t`) are warned-and- dropped; their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2. Mirrors TwoStageDiD's always-treated convention. Variance (Wave B MVP) --------------------- Stage-2 OLS variance via `solve_ols` — HC1, Conley spatial-HAC, and cluster-robust paths all flow through. The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_ variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). Reported SEs are conservative relative to the full GMM + Conley sandwich. Deferred features (planned follow-ups) -------------------------------------- - `event_study=True` per-event-time × ring coefficients (Butts Table 2) - `survey_design=` integration - `ring_method="count"` (count-of-treated-in-ring) - Data-driven `d_bar` selection (Butts 2021b / 2023 JUE Insight) - Gardner GMM first-stage correction at stage 2 - Sparse staggered ring-distance path - TwoStageDiD / SpilloverDiD shared-internals refactor Tests ----- 139 tests at `tests/test_spillover.py` across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC (50-seed default + 200-seed `@pytest.mark.slow` variant), Conley wiring, Gardner identity bit-identity (20-seed deterministic regression test against direct single-stage TWFE ring regression), coefficients-vs-vcov column alignment, and Omega_0 warn- and-drop. DGP factories at `tests/_dgp_utils.py::generate_butts_ nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. Documentation ------------- - `docs/methodology/REGISTRY.md` — new SpilloverDiD section adjacent to ConleySpatialHAC with the methodology spec, edge-case table, and documented deviations. - `docs/api/spillover.rst` — API reference with Wave B MVP limitations. - `diff_diff/guides/llms.txt` + `llms-full.txt` — agent-facing catalog entries. - `README.md` — one-line catalog entry under `## Estimators`. - `docs/references.rst` — Butts (2021/2023) + Gardner (2022) citations. - `docs/doc-deps.yaml` — `diff_diff/spillover.py` → `[REGISTRY.md#spillover, docs/api/spillover.rst]`. - `TODO.md` — deferred-features rows under "Tech Debt from Code Reviews" for the planned follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-15T21:44:59Z

/ai-review

igerber · 2026-05-15T21:51:17Z

/ai-review

github-actions · 2026-05-15T21:57:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fc48d69a654d4206da0cf3e5312d2a61ad67d321

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Cross-checking against docs/methodology/REGISTRY.md did not surface an undocumented methodology deviation. The PR’s main departures from a literal Butts/Gardner implementation are explicitly documented in docs/methodology/REGISTRY.md:L2952-L2985.
The new estimator avoids the project’s inference anti-patterns: ATT and per-ring outputs both route through safe_inference() rather than inline t_stat = coef / se logic (diff_diff/spillover.py:L1745-L1757).
Input validation is strong on the edge cases called out in prior reviews: duplicate cells, unbalanced panels, NaN treatment/IDs/clusters, reversible treatment, unsupported hc2/hc2_bm, and missing far-away controls (diff_diff/spillover.py:L947-L1284).
Minor follow-ups only: SE extraction should clamp negative vcov diagonals for consistency, and the staggered-distance helper still has a small avoidable performance hotspot.
I could not execute the tests in this sandbox because the available Python environment does not have numpy/pandas; this is a static review only.

Methodology

P3 — Informational: the material methodology deviations are documented rather than hidden: time-varying ring regressor, stricter Omega_0, unit warn-and-drop, and omission of Gardner’s stage-1 GMM correction (docs/methodology/REGISTRY.md:L2952-L2985, TODO.md:L121-L127). Impact: no approval blocker under the repo policy; users still need to treat current analytical SEs as downward-biased until the tracked follow-up lands. Concrete fix: none required for this PR.

Code Quality

P3 — SpilloverDiD.fit() takes np.sqrt(vcov[i, i]) directly when building ATT and per-ring SEs (diff_diff/spillover.py:L1740-L1757). Sibling estimators usually clamp max(var, 0.0) first (diff_diff/two_stage.py:L1183-L1183, diff_diff/estimators.py:L606-L606, diff_diff/stacked_did.py:L515-L515). Impact: a numerically tiny negative diagonal from an indefinite Conley or near-singular sandwich can blank out the entire inference row with NaN instead of preserving a stable zero-floor SE. Concrete fix: wrap each diagonal in max(v, 0.0) before sqrt.

Performance

P3 — The staggered distance helper recomputes row_pos inside every cohort loop (diff_diff/spillover.py:L400-L425). Impact: avoidable extra O(n_rows × n_cohorts) Python work on already-dense staggered fits. Concrete fix: precompute row_pos once before the for onset in unique_onsets loop and reuse it.

Maintainability

No findings.

Tech Debt

P3 — The staggered helper says the sparse kd-tree branch is “queued as a follow-up — see TODO.md” (diff_diff/spillover.py:L365-L369), but the new spillover TODO block does not actually track that item (TODO.md:L121-L127). Impact: the deferred scaling work is easy to lose from the project’s review backlog. Concrete fix: add a TODO.md row for the sparse staggered ring-distance path, or remove the explicit TODO cross-reference from the code/docs.

Security

No findings.

Documentation/Tests

No findings on the diff itself. The new registry entry and the spillover test surface are both thorough (docs/methodology/REGISTRY.md:L2940-L3012, tests/test_spillover.py). Review limitation: I was not able to run the tests in this sandbox because the Python environment here is missing numpy/pandas.

- Clamp `max(vcov[i, i], 0.0)` before sqrt for ATT and per-ring SE extraction (spillover.py:L1740-L1762). Matches the sibling-estimator convention at two_stage.py:1183, estimators.py:606, stacked_did.py:515. Prevents numerically tiny negative diagonals from indefinite Conley sandwiches or near-singular cases from NaN-ing the full inference row. - Hoist row_pos out of the per-cohort loop in _compute_nearest_treated_distance_staggered (spillover.py:L400-L425). row_pos depends only on row_unit and unit_to_pos, both invariant across the cohort iteration; one O(n_rows) array build instead of O(n_rows × n_cohorts) on dense staggered fits. - Add TODO.md row tracking the sparse cKDTree path for the staggered helper as Wave B follow-up. Resolves the stale code-comment reference in spillover.py:L365-L369. 139 tests pass; no behavior change on existing fixtures (the clamp is defensive against unrealizable values; the hoist is a refactor; the TODO is bookkeeping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T22:13:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6f3ade490449ab5cec7ae2250ad0e6249a526ea4

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Previous AI re-review items were addressed: the stage-2 SE path now clamps negative vcov diagonals before sqrt, the staggered helper precomputes row_pos once, and the sparse staggered path is now explicitly tracked in TODO.md (diff_diff/spillover.py:L400-L402, diff_diff/spillover.py:L1742-L1764, TODO.md:L121-L128).
At a high level, the estimator matches the cited Butts + Gardner synthesis: Butts frames the DiD target as jointly identifying direct and spillover effects under spatial spillovers, and Gardner supplies the two-stage DiD backbone reused here. (arxiv.org)
The major implementation departures from a literal paper reading are documented in the Methodology Registry, so under repo policy they are informational rather than blocking (docs/methodology/REGISTRY.md:L2952-L3010, TODO.md:L121-L128).
I did not find a new P0/P1 issue in control-group composition, NaN-safe inference, or public-parameter propagation on the shipped surface.
Remaining issues are minor: one registry gap for the public anticipation extension, one constructor-validation consistency issue, and one validation overclaim in REGISTRY.md.
Review limitation: I could not execute tests/test_spillover.py here because the sandbox Python environment lacks both numpy and pytest.

Methodology

No unmitigated source-material mismatch found. The time-varying ring regressor, stricter Omega_0, and omission of Gardner’s stage-1 GMM correction are explicitly recorded in the registry and TODO, so they are non-blocking per repo policy. At a high level, that is consistent with the cited Butts/Gardner setup. (arxiv.org)

P3 Impact: anticipation is a public methodology-affecting parameter, and the implementation clearly shifts both treatment timing and spillover timing with it, but the SpilloverDiD section of the Methodology Registry does not currently document that non-default estimand. Concrete fix: add a short **Note:** to docs/methodology/REGISTRY.md:L2940-L3010 mirroring the semantics already implemented in diff_diff/spillover.py:L850-L854 and diff_diff/spillover.py:L1441-L1544.

Code Quality

P3 Impact: SpilloverDiD.__init__ does not validate rank_deficient_action, unlike peer estimators, so a bad value fails later inside solve_ols instead of at the API boundary. Concrete fix: mirror the constructor guard used in diff_diff/two_stage.py:L137-L153 and diff_diff/stacked_did.py:L130-L148 inside diff_diff/spillover.py:L885-L916.

Performance

No findings. The remaining spillover-specific hotspot, the dense staggered nearest-treated-distance path, is now properly tracked in TODO.md:L128-L128.

Maintainability

No findings.

Tech Debt

P3 Impact: remaining deferrals for Gardner GMM correction, event-study mode, survey integration, and the sparse staggered path are properly tracked, so they are acceptable deferred work rather than blockers for this PR. Concrete fix: none required in this PR (TODO.md:L121-L128).

Security

No findings.

Documentation/Tests

P3 Impact: docs/methodology/REGISTRY.md:L2972-L2972 says correctness is anchored on a “reduce-to-TWFE / reduce-to-TwoStageDiD” limit, but the changed spillover tests only pin Monte Carlo recovery and single-stage FE identity; I do not see a TwoStageDiD reduction test in tests/test_spillover.py:L757-L806 or tests/test_spillover.py:L2271-L2420. Concrete fix: either add the missing TwoStageDiD reduction regression test or narrow the registry wording to the validations that actually shipped.

…_action guard - REGISTRY: add "Note (anticipation shift)" documenting the public `anticipation: int` parameter's effect on both treatment and ring- membership clocks. Matches the implementation semantics at spillover.py:850-854 + 1441-1544, and mirrors TwoStageDiD's anticipation parameter convention. - REGISTRY: narrow the "correctness anchored on reduce-to-TWFE / reduce-to-TwoStageDiD limits" claim. Only the reduce-to-TWFE limit shipped (the 20-seed Gardner identity bit-identity test at TestSpilloverDiDNonStaggeredFEEquivalence). The reduce-to-TwoStageDiD limit was scoped during planning but not shipped — the Omega_0 stricter subsample requires fixture work to align with TwoStageDiD's {D_it = 0} subsample. Queued as a follow-up. - spillover.py: add `rank_deficient_action` constructor guard mirroring two_stage.py:149-153 and stacked_did.py. Bad values now fail at __init__ with a clear ValueError instead of deep inside solve_ols. - tests: new TestSpilloverDiDRankDeficientActionValidation class exercising 6 invalid values + 3 valid values. 148 tests pass (was 139); black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T22:40:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b23aca4075acfc947ea3b0496521f190bca980c0

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Previous re-review items are addressed: anticipation is now documented and serialized on results, rank_deficient_action is validated in __init__, and the registry no longer claims a shipped reduce-to-TwoStageDiD test (docs/methodology/REGISTRY.md:2972-2974, diff_diff/spillover.py:885-906, diff_diff/results.py:445-462).
Methodology alignment is acceptable: Butts’ spillover setup uses untreated-and-unexposed rows for the first-stage residualization in the staggered case, and Gardner’s two-stage framework requires inference that accounts for the generated first-stage regressor. The PR’s omission of that GMM correction is explicitly documented and tracked, so it is informational rather than blocking. citeturn3view0turn3view1turn5view0
I did not find a new control-group, NaN-inference, or parameter-propagation issue in the changed spillover code.
Remaining issues are documentation-only: one registry note still overstates the shipped validation surface, and one public guide still advertises covariates without warning that the estimator rejects it.
Review limitation: I could not execute tests/test_spillover.py here because the sandbox lacks both pytest and numpy.

Methodology

P3 Impact: Stage-2 SEs still omit Gardner’s first-stage uncertainty correction, which would be a methodology defect if undocumented. Here it is explicitly disclosed in the registry and tracked in TODO.md, so under repo policy it is non-blocking informational debt. Concrete fix: none required in this PR; keep the existing follow-up entry at TODO.md:121. citeturn3view0turn5view0
No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: the deferred spillover items that remain after this PR are properly tracked in TODO.md (GMM correction, event_study, survey_design, count rings, data-driven d_bar, sparse staggered distance), so they are acceptable deferred work rather than blockers. Concrete fix: none required in this PR (TODO.md:121-128).

Security

No findings.

Documentation/Tests

P3 Impact: REGISTRY.md says the synthetic-DGP Monte Carlo anchor includes both 50-seed and 200-seed recovery of tau_total and delta_1, but the shipped tests only pin delta_1 on the non-staggered DGP; the staggered MC currently checks tau_total only. That overstates the validation actually merged. Concrete fix: either add staggered delta_1 recovery tests (and a slow variant) or narrow the registry note to the validation that is present (docs/methodology/REGISTRY.md:2974, tests/test_spillover.py:764-810, tests/test_spillover.py:812-822).
P3 Impact: the LLM guide exposes fit(..., covariates=...) in the public signature but does not list that covariates currently raises NotImplementedError, even though the estimator rejects it in code. That makes the documented surface broader than the real one. Concrete fix: add an explicit covariates= limitation bullet in the spillover guide section, at minimum in diff_diff/guides/llms-full.txt (diff_diff/guides/llms-full.txt:488-507, diff_diff/spillover.py:1353-1363).

…tion - REGISTRY: narrow the MC anchor claim to reflect what's actually shipped — non-staggered DGP tests pin both tau_total AND delta_1, staggered DGP tests pin tau_total only; per-ring delta_jk recovery on staggered DGPs queued alongside event_study=True support. - llms-full.txt: add the missing `covariates= raises NotImplementedError` limitation bullet so the documented surface matches the runtime rejection (the fit signature exposes covariates= but the estimator rejects it with a clear explanation about Omega_0 stage-1 fitting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T22:55:16Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9581b21cb586ab8a9e9a108d75baa2db4f4d7b51

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Previous re-review items are mostly addressed: anticipation is now carried on results serialization, rank_deficient_action is validated at construction, and the public LLM guide now explicitly says covariates= is unsupported. See diff_diff/results.py:L402-L462, diff_diff/spillover.py:L885-L906, and diff_diff/guides/llms-full.txt:L501-L508.
Methodology alignment is acceptable against the registry: the time-varying ring regressor, stricter Omega_0, current-period untreated far-away logic, and omitted Gardner GMM correction are all explicitly documented or tracked, so they are non-blocking in this PR. See docs/methodology/REGISTRY.md:L2952-L3012 and TODO.md:L121-L128.
I did not find a new control-group, NaN-inference, parameter-propagation, or undocumented methodology defect in the changed estimator code.
One previous P3 documentation issue remains in narrowed form: the registry/changelog still overstate the merged staggered Monte Carlo coverage.
Review limitation: I could not execute tests/test_spillover.py here because this sandbox does not have numpy, pandas, scipy, or pytest.

Methodology

P3 Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but that deviation is explicitly documented in the methodology registry and tracked in TODO.md, so under repo policy it is informational rather than blocking. Concrete fix: none required in this PR; keep the existing follow-up tracked in docs/methodology/REGISTRY.md:L2985-L3012 and TODO.md:L121-L128.
No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: The main deferred spillover items (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance, GMM correction) are properly tracked in TODO.md, so they are acceptable deferrals rather than blockers. Concrete fix: none required in this PR. See TODO.md:L121-L128.

Security

No findings.

Documentation/Tests

P3 Impact: The validation narrative still overstates the shipped staggered Monte Carlo coverage. REGISTRY.md says the staggered anchor is a 50-seed default plus a 200-seed slow variant, and the changelog summarizes the identification MC in the same way, but the merged test file only contains a 30-seed staggered tau_total test and no slow staggered companion. Concrete fix: either add the promised staggered 50-seed and 200-seed tests, or narrow the docs/changelog to the 30-seed staggered tau_total test that actually exists. See docs/methodology/REGISTRY.md:L2974-L2974, CHANGELOG.md:L11-L11, tests/test_spillover.py:L796-L810, and tests/test_spillover.py:L812-L825.

…d reality The round-3 narrowing left the "50-seed default + 200-seed slow variant" phrasing applying to both non-staggered AND staggered MC tests. Only non-staggered actually has the 50/200 structure (and recovers both tau_total and delta_1); staggered is a single 30-seed test that anchors tau_total only with a looser 0.04 tolerance (each staggered DGP is larger and noisier). Honest end-state for the docs. Also bumps the CHANGELOG test count 139 -> 148 to reflect the rank_deficient_action validation class added in the previous polish commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T23:17:24Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 23cdf0ef20c06416e512132595d9b2be6c53b485

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains in the new SpilloverDiD identification path.

Executive Summary

Previous re-review items look addressed: anticipation now round-trips on results serialization, rank_deficient_action is validated at construction, and the docs now match the shipped staggered MC coverage.
The estimator matches the documented registry deviations on the time-varying ring regressor, stricter Omega_0, current-period untreated far-away controls, and the omitted Gardner GMM correction.
P1: SpilloverDiD checks balancedness plus per-unit/per-period Omega_0 coverage, but it never checks whether the Omega_0 = {D_it = 0, S_it = 0} unit-time graph is actually connected. Some balanced panels will therefore pass validation even though stage-1 FE are not jointly identified, and tau_total / delta_j can be silently distorted.
The missing Gardner GMM correction remains P3 informational because it is explicitly documented in the methodology registry and tracked in TODO.md.
Review limitation: I could not execute the test suite here because pytest, numpy, pandas, and scipy are not installed in this environment.

Methodology

Severity P1
Impact: SpilloverDiD.fit() treats a balanced panel plus per-unit/per-period Omega_0 presence as sufficient support for the stage-1 FE model (diff_diff/spillover.py:1104, diff_diff/spillover.py:1563). That proxy is not enough. A balanced panel can still have a disconnected Omega_0 bipartite graph, in which case _iterative_fe_subset() only identifies FE up to component-specific constants while _residualize_butts() later combines unit FE from one component with time FE from another. That silently changes the residualized outcome used to estimate both tau_total and delta_j. The public docs currently describe exact graph-connectivity as a future relaxation of the balanced-panel gate (docs/api/spillover.rst:204, docs/methodology/REGISTRY.md:3009), but for this Gardner/Butts stage-1 design it is the actual live identification condition.
Concrete fix: after building omega_0_mask, construct the unit-time support graph on Omega_0 rows and raise ValueError unless all stage-2-supported rows lie in a single connected component. Add a regression test for a balanced-but-disconnected Omega_0 panel.
Severity P3
Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but this deviation is explicitly documented and tracked (docs/methodology/REGISTRY.md:2987, TODO.md:121). Under repo policy this is informational, not blocking.
Concrete fix: none required in this PR.

Code Quality

No additional findings.

Performance

No additional findings.

Maintainability

No additional findings.

Tech Debt

Severity P3
Impact: The remaining spillover follow-ups (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance path, GMM correction) are properly tracked in TODO.md (TODO.md:121).
Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity P2
Impact: The current tests cover missing-period Omega_0 support and unbalanced panels (tests/test_spillover.py:1371, tests/test_spillover.py:2004), but they do not cover the balanced-but-disconnected Omega_0 case behind the P1 above. That leaves the identification bug unguarded in CI.
Concrete fix: add a fixture where every unit and every period has at least one Omega_0 row, yet the Omega_0 graph splits into two disconnected components, and assert that fit() raises with a connectivity-specific error.

Path to Approval

Add an explicit connected-component check for the Omega_0 unit-time graph in SpilloverDiD.fit() and fail fast on disconnected support.
Add a regression test for a balanced panel that passes the current unit/period/balanced-panel guards but still has disconnected Omega_0 support.
Update docs/api/spillover.rst, docs/methodology/REGISTRY.md, and the [Unreleased] changelog entry so they state the actual identification requirement: balancedness alone is not sufficient; connected Omega_0 support is required.

…pth) Stage-1's iterative FE solver identifies (mu_i, lambda_t) only up to component-specific constants per connected component of the bipartite graph (supported units linked by shared untreated-and-unexposed periods). If the graph splits into K > 1 components, residualization combines mu_i from one component with lambda_t from another, silently corrupting y_tilde and downstream tau_total / delta_j. Balanced panel + per-unit / per-period Omega_0 coverage is NECESSARY but not SUFFICIENT; connectivity is the load-bearing identification condition. Under the current absorbing-treatment + period-strict + unit-warn-drop regime the disconnected case appears unreachable through `.fit()` (the combination of validators forces a connected supported subgraph), but the check is defense-in-depth and future-proofs the Wave B follow-ups (event_study, survey_design integration, possible reversible-treatment relaxations). - diff_diff/spillover.py: new `_check_omega_0_connectivity` helper using scipy.sparse.csgraph.connected_components; called immediately after the unit-level warn-and-drop block (Step 10c). Operates on the SUPPORTED-units subgraph (warn-dropped units are excluded so they don't form trivial singletons). - tests/test_spillover.py: TestSpilloverDiDOmega0Connectivity (5 tests) unit-tests the helper directly with synthetic (unit_codes, time_codes, omega_0_mask) arrays — disconnected 2-component case raises; bridge- unit-connected case succeeds; n_supp <= 1 short-circuits; 3-component error names units; normal Butts DGP through `.fit()` doesn't trigger the check. - docs/methodology/REGISTRY.md, docs/api/spillover.rst, CHANGELOG.md: extend the Omega_0 identification-policy note to include connectivity. 153 tests pass (was 148); black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T23:39:00Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c1ba56427cfc4592db51e087862356a13d6e5b8a

Overall Assessment

✅ Looks good — the prior P1 on disconnected Omega_0 support is resolved, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator path.

Executive Summary

The previous blocker is addressed: SpilloverDiD.fit() now explicitly checks connectedness of the supported Omega_0 unit-period graph before stage 1, and the failure mode is documented and tested (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, docs/methodology/REGISTRY.md:L2966-L2968, tests/test_spillover.py:L2101-L2208).
The implemented estimator structure matches the cited methodology: Butts’ staggered procedure estimates unit/time effects on untreated-and-unexposed observations and then regresses the residualized outcome on treatment and spillover dummies; Gardner likewise treats second-stage inference as generated-regressor inference. (kylebutts.com)
The remaining omission of Gardner’s first-stage GMM variance correction is still present, but it is explicitly documented in the Methodology Registry/API docs and tracked in TODO.md, so under the project rubric it is P3 informational rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com)
I did not identify a new silent correctness bug in control-group construction, ring weighting, NaN/inference handling, or parameter propagation in the changed files.
Review limitation: I could not execute pytest here because pytest is not installed in this environment.

Methodology

Severity P3 | Impact: Stage-2 SEs still omit Gardner’s first-stage GMM correction. That matters for inference in principle, but this deviation is explicitly documented and tracked, so it is mitigated under the review rubric rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com) Concrete fix: None required in this PR; follow-up already tracked.
No unmitigated methodology findings. The prior connectivity issue is now handled explicitly in code and covered by direct tests (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, tests/test_spillover.py:L2101-L2208).

Code Quality

No findings.

Performance

Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses the dense per-cohort distance path, so large staggered panels may pay avoidable memory/runtime cost; this is already tracked in TODO.md:L128-L128. Concrete fix: None required in this PR; follow-up already tracked.

Maintainability

No findings.

Tech Debt

Severity P3 | Impact: The remaining Wave B spillover follow-ups (event_study, survey_design, count-based rings, data-driven d_bar, GMM correction, sparse staggered distance path, first-class Conley in TwoStageDiD) are explicitly tracked in TODO.md:L121-L128, so they are deferred work rather than blockers. Concrete fix: None required in this PR.

Security

No findings.

Documentation/Tests

No unmitigated findings. The public docs now describe the connectivity requirement and the period-strict/unit-warn-drop behavior, and the new tests cover the connectivity helper directly (docs/api/spillover.rst:L204-L230, docs/methodology/REGISTRY.md:L2966-L2974, tests/test_spillover.py:L2101-L2208). Review limitation: I could not run the suite locally because pytest is unavailable.

igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446
igerber wants to merge 6 commits into
mainfrom
spillover-conley-butts-wave-b

igerber commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 15, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

igerber commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant