Skip to content

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446

Open
igerber wants to merge 6 commits into
mainfrom
spillover-conley-butts-wave-b
Open

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446
igerber wants to merge 6 commits into
mainfrom
spillover-conley-butts-wave-b

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 15, 2026

Summary

  • New standalone estimator SpilloverDiD (diff_diff/spillover.py) implementing two-stage Gardner DiD with ring-indicator covariates that identify both the direct effect on treated units (tau_total) and per-ring spillover effects on near-control units (delta_j). Handles non-staggered and Section 5 staggered timing in a single estimator.

Methodology references

  • Method: Butts (2021/2023) ring-indicator spillover-aware DiD; Gardner (2022) two-stage residualize-then-fit; reuses the Conley (1999) spatial-HAC vcov shipped in 3.3.3.
  • Paper / source links:
    • Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. arXiv:2105.03737v3https://arxiv.org/abs/2105.03737
    • Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943https://arxiv.org/abs/2207.05943
    • Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. Journal of Econometrics 92(1), 1-45.
  • Intentional deviations from source:
    • Stage-2 regressor uses the time-varying (1 - D_it) * Ring_{it,j} form (paper page 12's S_it = S_i * 1{t >= t_treat} notation; Section 5 Table 2's S^k_{it} / Ring^k_{it,j}). The literal unit-static reading of Equation 5 is algebraically rank-deficient under TWFE; only the time-varying form supports the paper's identification (Proposition 2.3). Documented in docs/methodology/REGISTRY.md § SpilloverDiD.
    • Stage-1 subsample uses Butts' stricter Omega_0 = {D_it = 0 AND S_it = 0} (untreated AND unexposed) rather than TwoStageDiD's {D_it = 0} (untreated only). Prevents spillover-contaminated near-controls from biasing the time FE.
    • Omega_0 row-level identification check: period strict (hard ValueError — dropping a period removes all units' cross-time identification), unit warn-and-drop (mirrors TwoStageDiD's always-treated convention; the downstream finite-mask path excludes the affected rows from stage 2).
    • Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (planned follow-up extends two_stage.py::_compute_gmm_variance). Documented in REGISTRY + TODO.md.
    • Far-away control identification uses current-period untreated status (D_it = 0) rather than never-treated-only, so all-eventually-treated staggered designs can identify the counterfactual via not-yet-treated far-away rows.
  • No R parity benchmark: did2s implements Gardner two-stage without rings; no published R/Stata software implements the Butts ring estimator. Correctness anchored on (a) 20-seed deterministic regression test pinning SpilloverDiD.att against direct single-stage TWFE ring regression at atol=1e-10 (the Gardner identity equivalence for non-staggered timing — empirically bit-identical, so the reported non-staggered tau_total IS the Butts Eqs. 4-6 estimator), (b) 50-seed Monte Carlo identification recovery on synthetic Butts-Assumption-satisfying DGPs (+ 200-seed @pytest.mark.slow variant), and (c) Conley sparse-vs-dense parity inherited from the 3.3.3 release.

Validation

  • Tests added (139 total in tests/test_spillover.py):
    • Ring-construction primitives (pairwise haversine/euclidean, callable metrics, static + staggered nearest-treated distance, sparse kd-tree path)
    • Validators (rings sort/non-negative/start-at-0, d_bar consistency, treatment XOR first_treat, exact {0,1} treatment, NaN rejection on cluster/unit/time/first_treat/treatment, balanced panel, duplicate cells, non-absorbing treatment, conley_coords within-unit-constant, callable metric self-distance contract, hc2/hc2_bm rejected, NaN in outcome rejected, mixed-encoding time collapse caught)
    • Fit integration (non-staggered + staggered, raw-data invariant, Conley wiring, baseline-treated unit recognition + warn-and-drop, partial-unsupported-units warn-and-drop, period-level hard ValueError)
    • Identification (50-seed MC recovery of tau_total and delta_j; 200-seed slow variant)
    • Gardner identity (20-seed bit-identity vs single-stage TWFE ring regression at atol=1e-10)
    • Coefficients-vs-vcov column alignment + ATT alias
  • DGP factories (tests/_dgp_utils.py): generate_butts_nonstaggered_dgp / generate_butts_staggered_dgp satisfy Butts Assumptions 1/3/5/7 by construction.
  • No tutorial in this PR: T22 (Butts TVA tutorial) is deferred to a follow-up notebook PR per user-confirmed Wave B scope.
  • Documentation: REGISTRY section adjacent to ConleySpatialHAC, docs/api/spillover.rst, diff_diff/guides/llms.txt + llms-full.txt, docs/references.rst, docs/doc-deps.yaml, README catalog entry, TODO.md rows for deferred follow-ups.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

New standalone estimator at `diff_diff/spillover.py` implementing
two-stage Gardner (2022) DiD with ring-indicator covariates that
identify, alongside the direct effect on treated (`tau_total`), per-ring
spillover effects on near-control units (`delta_j`). Reference: Butts, K.
(2023, originally 2021) "Difference-in-Differences with Spatial
Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage
differences in differences" arXiv:2207.05943.

Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered
timing in one estimator — non-staggered is the special case where all
treated units share an onset time.

Methodology
-----------
- Stage-2 regressor: time-varying `(1 - D_it) * Ring_{it,j}` (paper
  page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's
  `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 -
  D_it) * S_i` from Equation 5 is rank-deficient under TWFE; only the
  time-varying form supports the paper's identification (Prop 2.3).
- Stage-1 subsample: Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}`
  (untreated AND unexposed) — not TwoStageDiD's `{D_it = 0}` — prevents
  spillover-contaminated near-controls from biasing the time FE.
- Gardner identity (non-staggered): empirically bit-identical to direct
  single-stage TWFE ring regression on the full sample at atol=1e-10
  (20-seed deterministic regression test). The reported non-staggered
  `tau_total` IS the Butts Eqs. 4-6 estimator.

API
---
    SpilloverDiD(
        rings=[0, 50, 100, 200],
        conley_coords=("lat", "lon"),
        vcov_type="conley",            # or "hc1" / cluster
        conley_cutoff_km=200.0,
        conley_lag_cutoff=0,
    ).fit(data, outcome="y", unit="unit", time="t", treatment="D")

Binary `D` auto-converts to a Gardner `first_treat` column; users with
canonical staggered data can pass `first_treat=` directly. Result is
`SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`,
`.spillover_effects` (per-ring DataFrame with coef/se/t_stat/p_value/CI),
`.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`,
`.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all
`(1+K)` stage-2 entries keyed to vcov columns plus an `"ATT"` alias.

Identification-check policy
---------------------------
- Period level (structural): every period must have at least one Omega_0
  row, else time FE for that period is unidentified — hard ValueError.
- Unit level (recoverable): units lacking Omega_0 rows (e.g. baseline-
  treated units with `D_it = 1` at all observed `t`) are warned-and-
  dropped; their unit FE is NaN, residualization writes NaN on their
  rows, and the downstream finite-mask path excludes them from stage 2.
  Mirrors TwoStageDiD's always-treated convention.

Variance (Wave B MVP)
---------------------
Stage-2 OLS variance via `solve_ols` — HC1, Conley spatial-HAC, and
cluster-robust paths all flow through. The Gardner GMM first-stage
uncertainty correction is NOT applied at stage 2 in this PR (documented
limitation; planned follow-up extends `two_stage.py::_compute_gmm_
variance` to accept a Conley kernel matrix in place of HC1's identity at
the influence-function outer-product step). Reported SEs are conservative
relative to the full GMM + Conley sandwich.

Deferred features (planned follow-ups)
--------------------------------------
- `event_study=True` per-event-time × ring coefficients (Butts Table 2)
- `survey_design=` integration
- `ring_method="count"` (count-of-treated-in-ring)
- Data-driven `d_bar` selection (Butts 2021b / 2023 JUE Insight)
- Gardner GMM first-stage correction at stage 2
- Sparse staggered ring-distance path
- TwoStageDiD / SpilloverDiD shared-internals refactor

Tests
-----
139 tests at `tests/test_spillover.py` across ring-construction
primitives, validators, fit integration, raw-data invariant,
identification MC (50-seed default + 200-seed `@pytest.mark.slow`
variant), Conley wiring, Gardner identity bit-identity (20-seed
deterministic regression test against direct single-stage TWFE ring
regression), coefficients-vs-vcov column alignment, and Omega_0 warn-
and-drop. DGP factories at `tests/_dgp_utils.py::generate_butts_
nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts
Assumptions 1/3/5/7 by construction.

Documentation
-------------
- `docs/methodology/REGISTRY.md` — new SpilloverDiD section adjacent to
  ConleySpatialHAC with the methodology spec, edge-case table, and
  documented deviations.
- `docs/api/spillover.rst` — API reference with Wave B MVP limitations.
- `diff_diff/guides/llms.txt` + `llms-full.txt` — agent-facing catalog
  entries.
- `README.md` — one-line catalog entry under `## Estimators`.
- `docs/references.rst` — Butts (2021/2023) + Gardner (2022) citations.
- `docs/doc-deps.yaml` — `diff_diff/spillover.py` →
  `[REGISTRY.md#spillover, docs/api/spillover.rst]`.
- `TODO.md` — deferred-features rows under "Tech Debt from Code
  Reviews" for the planned follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 15, 2026

/ai-review

1 similar comment
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 15, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fc48d69a654d4206da0cf3e5312d2a61ad67d321


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Cross-checking against docs/methodology/REGISTRY.md did not surface an undocumented methodology deviation. The PR’s main departures from a literal Butts/Gardner implementation are explicitly documented in docs/methodology/REGISTRY.md:L2952-L2985.
  • The new estimator avoids the project’s inference anti-patterns: ATT and per-ring outputs both route through safe_inference() rather than inline t_stat = coef / se logic (diff_diff/spillover.py:L1745-L1757).
  • Input validation is strong on the edge cases called out in prior reviews: duplicate cells, unbalanced panels, NaN treatment/IDs/clusters, reversible treatment, unsupported hc2/hc2_bm, and missing far-away controls (diff_diff/spillover.py:L947-L1284).
  • Minor follow-ups only: SE extraction should clamp negative vcov diagonals for consistency, and the staggered-distance helper still has a small avoidable performance hotspot.
  • I could not execute the tests in this sandbox because the available Python environment does not have numpy/pandas; this is a static review only.

Methodology

  • P3 — Informational: the material methodology deviations are documented rather than hidden: time-varying ring regressor, stricter Omega_0, unit warn-and-drop, and omission of Gardner’s stage-1 GMM correction (docs/methodology/REGISTRY.md:L2952-L2985, TODO.md:L121-L127). Impact: no approval blocker under the repo policy; users still need to treat current analytical SEs as downward-biased until the tracked follow-up lands. Concrete fix: none required for this PR.

Code Quality

  • P3SpilloverDiD.fit() takes np.sqrt(vcov[i, i]) directly when building ATT and per-ring SEs (diff_diff/spillover.py:L1740-L1757). Sibling estimators usually clamp max(var, 0.0) first (diff_diff/two_stage.py:L1183-L1183, diff_diff/estimators.py:L606-L606, diff_diff/stacked_did.py:L515-L515). Impact: a numerically tiny negative diagonal from an indefinite Conley or near-singular sandwich can blank out the entire inference row with NaN instead of preserving a stable zero-floor SE. Concrete fix: wrap each diagonal in max(v, 0.0) before sqrt.

Performance

  • P3 — The staggered distance helper recomputes row_pos inside every cohort loop (diff_diff/spillover.py:L400-L425). Impact: avoidable extra O(n_rows × n_cohorts) Python work on already-dense staggered fits. Concrete fix: precompute row_pos once before the for onset in unique_onsets loop and reuse it.

Maintainability

  • No findings.

Tech Debt

  • P3 — The staggered helper says the sparse kd-tree branch is “queued as a follow-up — see TODO.md” (diff_diff/spillover.py:L365-L369), but the new spillover TODO block does not actually track that item (TODO.md:L121-L127). Impact: the deferred scaling work is easy to lose from the project’s review backlog. Concrete fix: add a TODO.md row for the sparse staggered ring-distance path, or remove the explicit TODO cross-reference from the code/docs.

Security

  • No findings.

Documentation/Tests

  • No findings on the diff itself. The new registry entry and the spillover test surface are both thorough (docs/methodology/REGISTRY.md:L2940-L3012, tests/test_spillover.py). Review limitation: I was not able to run the tests in this sandbox because the Python environment here is missing numpy/pandas.

- Clamp `max(vcov[i, i], 0.0)` before sqrt for ATT and per-ring SE
  extraction (spillover.py:L1740-L1762). Matches the sibling-estimator
  convention at two_stage.py:1183, estimators.py:606, stacked_did.py:515.
  Prevents numerically tiny negative diagonals from indefinite Conley
  sandwiches or near-singular cases from NaN-ing the full inference row.
- Hoist row_pos out of the per-cohort loop in
  _compute_nearest_treated_distance_staggered (spillover.py:L400-L425).
  row_pos depends only on row_unit and unit_to_pos, both invariant across
  the cohort iteration; one O(n_rows) array build instead of O(n_rows ×
  n_cohorts) on dense staggered fits.
- Add TODO.md row tracking the sparse cKDTree path for the staggered
  helper as Wave B follow-up. Resolves the stale code-comment reference
  in spillover.py:L365-L369.

139 tests pass; no behavior change on existing fixtures (the clamp is
defensive against unrealizable values; the hoist is a refactor; the TODO
is bookkeeping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6f3ade490449ab5cec7ae2250ad0e6249a526ea4


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Previous AI re-review items were addressed: the stage-2 SE path now clamps negative vcov diagonals before sqrt, the staggered helper precomputes row_pos once, and the sparse staggered path is now explicitly tracked in TODO.md (diff_diff/spillover.py:L400-L402, diff_diff/spillover.py:L1742-L1764, TODO.md:L121-L128).
  • At a high level, the estimator matches the cited Butts + Gardner synthesis: Butts frames the DiD target as jointly identifying direct and spillover effects under spatial spillovers, and Gardner supplies the two-stage DiD backbone reused here. (arxiv.org)
  • The major implementation departures from a literal paper reading are documented in the Methodology Registry, so under repo policy they are informational rather than blocking (docs/methodology/REGISTRY.md:L2952-L3010, TODO.md:L121-L128).
  • I did not find a new P0/P1 issue in control-group composition, NaN-safe inference, or public-parameter propagation on the shipped surface.
  • Remaining issues are minor: one registry gap for the public anticipation extension, one constructor-validation consistency issue, and one validation overclaim in REGISTRY.md.
  • Review limitation: I could not execute tests/test_spillover.py here because the sandbox Python environment lacks both numpy and pytest.

Methodology

No unmitigated source-material mismatch found. The time-varying ring regressor, stricter Omega_0, and omission of Gardner’s stage-1 GMM correction are explicitly recorded in the registry and TODO, so they are non-blocking per repo policy. At a high level, that is consistent with the cited Butts/Gardner setup. (arxiv.org)

Code Quality

Performance

No findings. The remaining spillover-specific hotspot, the dense staggered nearest-treated-distance path, is now properly tracked in TODO.md:L128-L128.

Maintainability

No findings.

Tech Debt

  • P3 Impact: remaining deferrals for Gardner GMM correction, event-study mode, survey integration, and the sparse staggered path are properly tracked, so they are acceptable deferred work rather than blockers for this PR. Concrete fix: none required in this PR (TODO.md:L121-L128).

Security

No findings.

Documentation/Tests

…_action guard

- REGISTRY: add "Note (anticipation shift)" documenting the public
  `anticipation: int` parameter's effect on both treatment and ring-
  membership clocks. Matches the implementation semantics at
  spillover.py:850-854 + 1441-1544, and mirrors TwoStageDiD's
  anticipation parameter convention.
- REGISTRY: narrow the "correctness anchored on reduce-to-TWFE /
  reduce-to-TwoStageDiD limits" claim. Only the reduce-to-TWFE limit
  shipped (the 20-seed Gardner identity bit-identity test at
  TestSpilloverDiDNonStaggeredFEEquivalence). The reduce-to-TwoStageDiD
  limit was scoped during planning but not shipped — the Omega_0
  stricter subsample requires fixture work to align with TwoStageDiD's
  {D_it = 0} subsample. Queued as a follow-up.
- spillover.py: add `rank_deficient_action` constructor guard mirroring
  two_stage.py:149-153 and stacked_did.py. Bad values now fail at
  __init__ with a clear ValueError instead of deep inside solve_ols.
- tests: new TestSpilloverDiDRankDeficientActionValidation class
  exercising 6 invalid values + 3 valid values.

148 tests pass (was 139); black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b23aca4075acfc947ea3b0496521f190bca980c0


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Previous re-review items are addressed: anticipation is now documented and serialized on results, rank_deficient_action is validated in __init__, and the registry no longer claims a shipped reduce-to-TwoStageDiD test (docs/methodology/REGISTRY.md:2972-2974, diff_diff/spillover.py:885-906, diff_diff/results.py:445-462).
  • Methodology alignment is acceptable: Butts’ spillover setup uses untreated-and-unexposed rows for the first-stage residualization in the staggered case, and Gardner’s two-stage framework requires inference that accounts for the generated first-stage regressor. The PR’s omission of that GMM correction is explicitly documented and tracked, so it is informational rather than blocking. citeturn3view0turn3view1turn5view0
  • I did not find a new control-group, NaN-inference, or parameter-propagation issue in the changed spillover code.
  • Remaining issues are documentation-only: one registry note still overstates the shipped validation surface, and one public guide still advertises covariates without warning that the estimator rejects it.
  • Review limitation: I could not execute tests/test_spillover.py here because the sandbox lacks both pytest and numpy.

Methodology

  • P3 Impact: Stage-2 SEs still omit Gardner’s first-stage uncertainty correction, which would be a methodology defect if undocumented. Here it is explicitly disclosed in the registry and tracked in TODO.md, so under repo policy it is non-blocking informational debt. Concrete fix: none required in this PR; keep the existing follow-up entry at TODO.md:121. citeturn3view0turn5view0
  • No unmitigated P0/P1 findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 Impact: the deferred spillover items that remain after this PR are properly tracked in TODO.md (GMM correction, event_study, survey_design, count rings, data-driven d_bar, sparse staggered distance), so they are acceptable deferred work rather than blockers. Concrete fix: none required in this PR (TODO.md:121-128).

Security

  • No findings.

Documentation/Tests

  • P3 Impact: REGISTRY.md says the synthetic-DGP Monte Carlo anchor includes both 50-seed and 200-seed recovery of tau_total and delta_1, but the shipped tests only pin delta_1 on the non-staggered DGP; the staggered MC currently checks tau_total only. That overstates the validation actually merged. Concrete fix: either add staggered delta_1 recovery tests (and a slow variant) or narrow the registry note to the validation that is present (docs/methodology/REGISTRY.md:2974, tests/test_spillover.py:764-810, tests/test_spillover.py:812-822).
  • P3 Impact: the LLM guide exposes fit(..., covariates=...) in the public signature but does not list that covariates currently raises NotImplementedError, even though the estimator rejects it in code. That makes the documented surface broader than the real one. Concrete fix: add an explicit covariates= limitation bullet in the spillover guide section, at minimum in diff_diff/guides/llms-full.txt (diff_diff/guides/llms-full.txt:488-507, diff_diff/spillover.py:1353-1363).

…tion

- REGISTRY: narrow the MC anchor claim to reflect what's actually
  shipped — non-staggered DGP tests pin both tau_total AND delta_1,
  staggered DGP tests pin tau_total only; per-ring delta_jk recovery
  on staggered DGPs queued alongside event_study=True support.
- llms-full.txt: add the missing `covariates= raises NotImplementedError`
  limitation bullet so the documented surface matches the runtime
  rejection (the fit signature exposes covariates= but the estimator
  rejects it with a clear explanation about Omega_0 stage-1 fitting).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9581b21cb586ab8a9e9a108d75baa2db4f4d7b51


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Previous re-review items are mostly addressed: anticipation is now carried on results serialization, rank_deficient_action is validated at construction, and the public LLM guide now explicitly says covariates= is unsupported. See diff_diff/results.py:L402-L462, diff_diff/spillover.py:L885-L906, and diff_diff/guides/llms-full.txt:L501-L508.
  • Methodology alignment is acceptable against the registry: the time-varying ring regressor, stricter Omega_0, current-period untreated far-away logic, and omitted Gardner GMM correction are all explicitly documented or tracked, so they are non-blocking in this PR. See docs/methodology/REGISTRY.md:L2952-L3012 and TODO.md:L121-L128.
  • I did not find a new control-group, NaN-inference, parameter-propagation, or undocumented methodology defect in the changed estimator code.
  • One previous P3 documentation issue remains in narrowed form: the registry/changelog still overstate the merged staggered Monte Carlo coverage.
  • Review limitation: I could not execute tests/test_spillover.py here because this sandbox does not have numpy, pandas, scipy, or pytest.

Methodology

  • P3 Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but that deviation is explicitly documented in the methodology registry and tracked in TODO.md, so under repo policy it is informational rather than blocking. Concrete fix: none required in this PR; keep the existing follow-up tracked in docs/methodology/REGISTRY.md:L2985-L3012 and TODO.md:L121-L128.
  • No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • P3 Impact: The main deferred spillover items (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance, GMM correction) are properly tracked in TODO.md, so they are acceptable deferrals rather than blockers. Concrete fix: none required in this PR. See TODO.md:L121-L128.

Security

No findings.

Documentation/Tests

  • P3 Impact: The validation narrative still overstates the shipped staggered Monte Carlo coverage. REGISTRY.md says the staggered anchor is a 50-seed default plus a 200-seed slow variant, and the changelog summarizes the identification MC in the same way, but the merged test file only contains a 30-seed staggered tau_total test and no slow staggered companion. Concrete fix: either add the promised staggered 50-seed and 200-seed tests, or narrow the docs/changelog to the 30-seed staggered tau_total test that actually exists. See docs/methodology/REGISTRY.md:L2974-L2974, CHANGELOG.md:L11-L11, tests/test_spillover.py:L796-L810, and tests/test_spillover.py:L812-L825.

…d reality

The round-3 narrowing left the "50-seed default + 200-seed slow variant"
phrasing applying to both non-staggered AND staggered MC tests. Only
non-staggered actually has the 50/200 structure (and recovers both
tau_total and delta_1); staggered is a single 30-seed test that anchors
tau_total only with a looser 0.04 tolerance (each staggered DGP is
larger and noisier). Honest end-state for the docs.

Also bumps the CHANGELOG test count 139 -> 148 to reflect the
rank_deficient_action validation class added in the previous polish
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 23cdf0ef20c06416e512132595d9b2be6c53b485


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains in the new SpilloverDiD identification path.

Executive Summary

  • Previous re-review items look addressed: anticipation now round-trips on results serialization, rank_deficient_action is validated at construction, and the docs now match the shipped staggered MC coverage.
  • The estimator matches the documented registry deviations on the time-varying ring regressor, stricter Omega_0, current-period untreated far-away controls, and the omitted Gardner GMM correction.
  • P1: SpilloverDiD checks balancedness plus per-unit/per-period Omega_0 coverage, but it never checks whether the Omega_0 = {D_it = 0, S_it = 0} unit-time graph is actually connected. Some balanced panels will therefore pass validation even though stage-1 FE are not jointly identified, and tau_total / delta_j can be silently distorted.
  • The missing Gardner GMM correction remains P3 informational because it is explicitly documented in the methodology registry and tracked in TODO.md.
  • Review limitation: I could not execute the test suite here because pytest, numpy, pandas, and scipy are not installed in this environment.

Methodology

  • Severity P1
    Impact: SpilloverDiD.fit() treats a balanced panel plus per-unit/per-period Omega_0 presence as sufficient support for the stage-1 FE model (diff_diff/spillover.py:1104, diff_diff/spillover.py:1563). That proxy is not enough. A balanced panel can still have a disconnected Omega_0 bipartite graph, in which case _iterative_fe_subset() only identifies FE up to component-specific constants while _residualize_butts() later combines unit FE from one component with time FE from another. That silently changes the residualized outcome used to estimate both tau_total and delta_j. The public docs currently describe exact graph-connectivity as a future relaxation of the balanced-panel gate (docs/api/spillover.rst:204, docs/methodology/REGISTRY.md:3009), but for this Gardner/Butts stage-1 design it is the actual live identification condition.
    Concrete fix: after building omega_0_mask, construct the unit-time support graph on Omega_0 rows and raise ValueError unless all stage-2-supported rows lie in a single connected component. Add a regression test for a balanced-but-disconnected Omega_0 panel.

  • Severity P3
    Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but this deviation is explicitly documented and tracked (docs/methodology/REGISTRY.md:2987, TODO.md:121). Under repo policy this is informational, not blocking.
    Concrete fix: none required in this PR.

Code Quality

No additional findings.

Performance

No additional findings.

Maintainability

No additional findings.

Tech Debt

  • Severity P3
    Impact: The remaining spillover follow-ups (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance path, GMM correction) are properly tracked in TODO.md (TODO.md:121).
    Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

  • Severity P2
    Impact: The current tests cover missing-period Omega_0 support and unbalanced panels (tests/test_spillover.py:1371, tests/test_spillover.py:2004), but they do not cover the balanced-but-disconnected Omega_0 case behind the P1 above. That leaves the identification bug unguarded in CI.
    Concrete fix: add a fixture where every unit and every period has at least one Omega_0 row, yet the Omega_0 graph splits into two disconnected components, and assert that fit() raises with a connectivity-specific error.

Path to Approval

  1. Add an explicit connected-component check for the Omega_0 unit-time graph in SpilloverDiD.fit() and fail fast on disconnected support.
  2. Add a regression test for a balanced panel that passes the current unit/period/balanced-panel guards but still has disconnected Omega_0 support.
  3. Update docs/api/spillover.rst, docs/methodology/REGISTRY.md, and the [Unreleased] changelog entry so they state the actual identification requirement: balancedness alone is not sufficient; connected Omega_0 support is required.

…pth)

Stage-1's iterative FE solver identifies (mu_i, lambda_t) only up to
component-specific constants per connected component of the bipartite
graph (supported units linked by shared untreated-and-unexposed periods).
If the graph splits into K > 1 components, residualization combines
mu_i from one component with lambda_t from another, silently corrupting
y_tilde and downstream tau_total / delta_j. Balanced panel + per-unit /
per-period Omega_0 coverage is NECESSARY but not SUFFICIENT;
connectivity is the load-bearing identification condition.

Under the current absorbing-treatment + period-strict + unit-warn-drop
regime the disconnected case appears unreachable through `.fit()` (the
combination of validators forces a connected supported subgraph), but
the check is defense-in-depth and future-proofs the Wave B follow-ups
(event_study, survey_design integration, possible reversible-treatment
relaxations).

- diff_diff/spillover.py: new `_check_omega_0_connectivity` helper using
  scipy.sparse.csgraph.connected_components; called immediately after
  the unit-level warn-and-drop block (Step 10c). Operates on the
  SUPPORTED-units subgraph (warn-dropped units are excluded so they
  don't form trivial singletons).
- tests/test_spillover.py: TestSpilloverDiDOmega0Connectivity (5 tests)
  unit-tests the helper directly with synthetic (unit_codes, time_codes,
  omega_0_mask) arrays — disconnected 2-component case raises; bridge-
  unit-connected case succeeds; n_supp <= 1 short-circuits; 3-component
  error names units; normal Butts DGP through `.fit()` doesn't trigger
  the check.
- docs/methodology/REGISTRY.md, docs/api/spillover.rst, CHANGELOG.md:
  extend the Omega_0 identification-policy note to include connectivity.

153 tests pass (was 148); black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c1ba56427cfc4592db51e087862356a13d6e5b8a


Overall Assessment

Looks good — the prior P1 on disconnected Omega_0 support is resolved, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator path.

Executive Summary

  • The previous blocker is addressed: SpilloverDiD.fit() now explicitly checks connectedness of the supported Omega_0 unit-period graph before stage 1, and the failure mode is documented and tested (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, docs/methodology/REGISTRY.md:L2966-L2968, tests/test_spillover.py:L2101-L2208).
  • The implemented estimator structure matches the cited methodology: Butts’ staggered procedure estimates unit/time effects on untreated-and-unexposed observations and then regresses the residualized outcome on treatment and spillover dummies; Gardner likewise treats second-stage inference as generated-regressor inference. (kylebutts.com)
  • The remaining omission of Gardner’s first-stage GMM variance correction is still present, but it is explicitly documented in the Methodology Registry/API docs and tracked in TODO.md, so under the project rubric it is P3 informational rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com)
  • I did not identify a new silent correctness bug in control-group construction, ring weighting, NaN/inference handling, or parameter propagation in the changed files.
  • Review limitation: I could not execute pytest here because pytest is not installed in this environment.

Methodology

  • Severity P3 | Impact: Stage-2 SEs still omit Gardner’s first-stage GMM correction. That matters for inference in principle, but this deviation is explicitly documented and tracked, so it is mitigated under the review rubric rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com) Concrete fix: None required in this PR; follow-up already tracked.
  • No unmitigated methodology findings. The prior connectivity issue is now handled explicitly in code and covered by direct tests (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, tests/test_spillover.py:L2101-L2208).

Code Quality

No findings.

Performance

  • Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses the dense per-cohort distance path, so large staggered panels may pay avoidable memory/runtime cost; this is already tracked in TODO.md:L128-L128. Concrete fix: None required in this PR; follow-up already tracked.

Maintainability

No findings.

Tech Debt

  • Severity P3 | Impact: The remaining Wave B spillover follow-ups (event_study, survey_design, count-based rings, data-driven d_bar, GMM correction, sparse staggered distance path, first-class Conley in TwoStageDiD) are explicitly tracked in TODO.md:L121-L128, so they are deferred work rather than blockers. Concrete fix: None required in this PR.

Security

No findings.

Documentation/Tests

No unmitigated findings. The public docs now describe the connectivity requirement and the period-strict/unit-warn-drop behavior, and the new tests cover the connectivity helper directly (docs/api/spillover.rst:L204-L230, docs/methodology/REGISTRY.md:L2966-L2974, tests/test_spillover.py:L2101-L2208). Review limitation: I could not run the suite locally because pytest is unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant