dCDH heterogeneity: per-path + global placebo predict_het R-parity + df threading#449
dCDH heterogeneity: per-path + global placebo predict_het R-parity + df threading#449igerber wants to merge 5 commits into
Conversation
Closes TODO #422 + pilot-412 in a single PR (same surface, same R fixture, same parity story). Phase 0 probe verified R behavior: did_multiplegt_dyn(by_path, predict_het, placebo) emits per-path heterogeneity OLS results on backward (placebo) horizons via R's per-by_level dispatcher (DIDmultiplegtDYN:::did_multiplegt_main placebo block at the `effect = matrix(-i, ...)` rbind site). New scenario 22 in benchmarks/R/generate_dcdh_dynr_test_values.R captures this with predict_het=list("het_x", c(-1)) — the c(-1) sentinel triggers "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" per the R source path read at script time. Phase 1A implementation (non-survey): _compute_heterogeneity_test gains a placebo: int = 0 parameter and iterates forward (1..L_max) and backward (-1..-placebo) horizons in a single loop. Explicit `if out_idx < 0: continue` eligibility guard prevents numpy negative-index silent wrap on N_mat[g, out_idx] when F_g - 1 + l_h < 0. _compute_path_heterogeneity_test forwards the param; fit() passes placebo=L_max if self.placebo else 0 to both global and per-path call sites. to_dataframe(level="by_path") placebo rows now read het_* values from path_heterogeneity_effects negative-int keys (mirroring the existing path_placebo_event_study negative-key convention) instead of the pre-PR hardcoded NaN-fill. Survey gate: when survey_design is active AND placebo > 0 + heterogeneity is requested, _compute_heterogeneity_test raises NotImplementedError eagerly with a documented message. The Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, which is a separate library-extension claim that needs its own derivation. Forward-horizon predict_het + survey continues to work unchanged. Pre-period allocator derivation tracked as a new follow-up TODO row. Phase 2 (df threading): _compute_heterogeneity_test now passes df = n_obs - n_params to safe_inference on the non-survey OLS path, matching R did_multiplegt_dyn(predict_het=...)'s t-distribution inference (qt(0.975, df.residual(model)) site). Pre-PR Python used df=None (Z critical), producing 0.1-2% rtol gaps on p_value/conf_int vs R. Existing forward-horizon parity tests now pin t/p/CI at INFERENCE_RTOL=1e-4 (was unpinned). Rank- deficient designs use design.shape[1] as df denominator (pre-drop column count); fully rank-deficient is NaN-short-circuited by the existing guard. Near-rank-deficient edge case tracked as a new Low TODO follow-up. R parity: scenario 22 (multi_path_reversible_predict_het_with_placebo, placebo=2, effects=3, by_path=3) pinned at BETA_RTOL=1e-6/SE_RTOL=1e-5 for beta/se/t_stat/n_obs and INFERENCE_RTOL=1e-4 for p_value/conf_int across 3 paths × (3 forward + 2 placebo) = 15 horizons. Cross-surface tests (TestByPathPredictHetPlacebo): placebo het column population, survey-gate NotImplementedError, forward+survey anti-regression, out_idx<0 eligibility guard, single-path telescope (path_heterogeneity_effects[(only_path,)] == heterogeneity_effects bit- exactly), summary rendering. The two existing local-invariant tests (test_*_inference_matches_safe_inference) refactored to verify SE-derivation wiring (t_stat=beta/se, conf_int symmetric around beta, p_value in [0,1]) without back-deriving n_params. REGISTRY: heterogeneity Z-vs-t deviation note replaced with positive "R parity (post-2026-05-15 df threading)" framing including the rank- deficient caveat. New "Per-path placebo heterogeneity" Note documents the R parity, syntax requirements (c(-1) sentinel), survey gate, and test anchors. CHANGELOG entry under [Unreleased]. llms-full.txt by_path entry extended with placebo het composition + survey-gate mention. API rst extended with the same. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_het parity
Two changes that resolve scope/contract issues with the previous commit on
this branch:
1. Survey gate (scope correctness). The eager `if use_survey and placebo > 0:
raise NotImplementedError` at function entry of `_compute_heterogeneity_test`
fired on EVERY survey + heterogeneity call when fit() passed
`placebo=L_max if self.placebo else 0` with default `placebo=True`. That
broke the previously-supported forward-horizon survey + predict_het path,
contradicting the changelog's "Forward-horizon predict_het + survey_design
is supported" claim. Refactored to a per-iteration backstop inside the
horizon loop (raises only when actually about to compute a backward
iteration under survey). fit() at both global and per-path heterogeneity
call sites now wraps the placebo arg with a warn-and-skip: if
`survey_design + placebo + heterogeneity` are all active, emit a
UserWarning explaining backward-horizon predict_het is deferred under
survey designs and pass `placebo=0` so only forward-horizon heterogeneity
is computed. The function-level NotImplementedError remains as a defensive
backstop for direct callers that bypass fit().
2. Global placebo predict_het parity (scope completeness). The previous
commit extended `_compute_heterogeneity_test` to compute backward horizons,
which means `results.heterogeneity_effects` (global, non-by_path surface)
also gains negative-int keys when `placebo=True + heterogeneity`. The R-
parity coverage was scoped to per-path only (scenario 22). Added scenario
23 (`multi_path_reversible_predict_het_with_placebo_global`) that calls
`did_multiplegt_dyn(predict_het=list("het_x", c(-1)), placebo=2, effects=3)`
WITHOUT by_path; verified R emits forward (effect=1,2,3) AND placebo
(effect=-1,-2) rows on the global surface with 3 paths × 30 switchers each
= n_obs=90 per row. New parity test class
`TestDCDHDynRParityHeterogeneityWithPlacebo` pins all 6 inference fields
across 5 horizons at the same tolerances as the per-path version.
Test changes:
- `test_predict_het_placebo_survey_design_raises` renamed to
`test_predict_het_placebo_survey_design_warns_and_skips_backward`;
asserts UserWarning is emitted, no exception raised, and both
`heterogeneity_effects` and `path_heterogeneity_effects` contain only
positive-int keys (forward-only emission).
- New `test_compute_heterogeneity_test_direct_call_raises_on_backward_survey`
exercises the per-iteration backstop directly; locks the API contract
for any future internal call site that bypasses fit().
- New `test_parity_multi_path_reversible_predict_het_with_placebo_global`
pins scenario 23 R-parity.
Doc updates: REGISTRY heterogeneity Note clarifies "warn + skip" semantics
(was "raises NotImplementedError"), references both global and per-path
parity classes. CHANGELOG entry rewritten to reflect global + per-path
scope. llms-full.txt and API rst updated similarly.
Test count: 312 -> 314 (added 2; the renamed test still passes under its
new contract).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The top-level heterogeneity Note in REGISTRY.md still said "This
implementation provides only post-treatment regressions" — stale after
the placebo predict_het additions in this branch's earlier commits.
Updated to reflect:
1. Python now matches R on per-horizon placebo regressions (when
`placebo=True` + `heterogeneity=` are co-set) — refers readers to
the "Placebo predict_het" sub-note for the full contract (global +
per-path scope, survey warn+skip behavior, R-parity classes).
2. The remaining gap from R is the joint null Wald F-test across all
predict_het rows — Python emits per-horizon inference only.
3. The `controls` mutex stays unchanged.
Cross-references the existing render-time note in `_render_heterogeneity_section`
("Per-horizon regressions only (no joint F-test)") so the registry text
matches the user-facing summary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Two informational findings from the latest review: 1. Stale `to_dataframe(level="by_path")` docstring at `chaisemartin_dhaultfoeuille_results.py:1530-1558` still claimed placebo `het_*` columns are NaN. Updated to document the post-#422 contract: positive-horizon AND negative-horizon (placebo) rows are both populated when `placebo=True + heterogeneity=` are co-set; placebo rows under `survey_design` remain NaN with a fit-time UserWarning. 2. JSON golden-fixture type instability for empty `placebo_predict_het` / `placebo_horizons` slots. R's `jsonlite::toJSON` serializes plain `list()` as `[]` (array) but populated named lists as `{}` (object), so consumers iterating `.items()` on the slot saw different shapes across scenarios. Fixed both ends: - R-side: extractors initialize empty slots with `structure(list(), names = character(0))` which jsonlite serializes as `{}` even when empty. Verified across 4 scenarios (20, 21, 22, 23) — all `placebo_predict_het` / `placebo_horizons` slots now serialize as objects regardless of population. - Python-side: added `_as_dict` helper in the parity test module as a defensive backstop coercing any non-dict (None / [] / missing) to {}. Used at the two call sites that read optional placebo slots so consumers can call `.items()` uniformly. The golden JSON regenerated; type-stable across all scenarios (verified via `jq` on each scenario's predict_het type). 314 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…tract Two stale in-code documentation surfaces from the latest review: 1. `chaisemartin_dhaultfoeuille.py:980-989` — the `heterogeneity` parameter docstring on `fit()` still said "post-treatment regressions only (no placebo regressions)". Updated to document the post-#422 contract: per-horizon OLS regressions on forward AND backward (placebo) horizons when `placebo=True`; survey_design composes with forward horizons but warns + skips backward horizons until the pre-period cell allocator is derived. 2. `chaisemartin_dhaultfoeuille_results.py:1279-1286` — the heterogeneity summary note didn't mention the survey forward-only fallback. Extended the note to cover the gating semantics so users reading `result.summary()` under `survey_design + heterogeneity` know what they're getting. Both surfaces now match the contract already documented in the API rst, REGISTRY, and CHANGELOG. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Code Quality Performance Maintainability Tech Debt
Security Documentation/Tests |
Summary
Closes TODO #422 (per-path placebo
predict_hetemission) and pilot-412 (heterogeneity OLS df threading) in a single bundle. Same surface, same R fixture, single clean R-parity story.R-fixture probe (scenario 22) verified that
did_multiplegt_dyn(by_path, predict_het, placebo)emits per-by_level heterogeneity OLS rows on backward (placebo) horizons via R's per-by_level dispatcher (DIDmultiplegtDYN:::did_multiplegt_mainplacebo block at theeffect = matrix(-i, ...)rbind site). Scenario 23 confirmed the same emission on the global (non-by_path) surface. Both surfaces now mirror in Python.R syntax note:
did_multiplegt_dyn(predict_het=list(\"X\", c(...)), placebo=N)rejects positive-only horizons (c(1, 2, 3)errors with "specified numbers in predict_het that exceed the number of placebos" because R reuses the same horizon vector for both forward and backward indexing). Thec(-1)sentinel triggers "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — both new fixtures use this syntax.Survey gate is warn + skip: when
survey_design + placebo + heterogeneityare co-set, fit() emits aUserWarningand falls back to forward-horizon-only heterogeneity. The Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, which is a separate library-extension claim deferred to a follow-up methodology PR. Forward-horizonpredict_het + survey_designcontinues to work unchanged.Phase 2 df threading:
_compute_heterogeneity_testnow passesdf = n_obs - n_paramstosafe_inferenceon the non-survey OLS path (matches Rqt(0.975, df.residual(model))); pre-PR Python useddf=None(Z critical), producing 0.1-2% rtol gaps onp_value/conf_intvs R. Forward parity tests tightened from "unpinned" toINFERENCE_RTOL=1e-4. Rank-deficient designs usedesign.shape[1]as df denominator (pre-drop column count); fully rank-deficient is NaN-short-circuited by the existing guard. Near-rank-deficient edge case tracked as a Low TODO follow-up.Methodology references (required if estimator / math changes)
ChaisemartinDHaultfoeuille.predict_het×placebo×by_path/paths_of_interestDIDmultiplegtDYN 2.3.3did_multiplegt_mainplacebo + predict_het blockpredict_hetrows: R aggregates; Python emits per-horizon inference only. Documented in REGISTRY heterogeneity Note.n_obs - n_params(pre-drop column count) vs R'sdf.residual = n - rankpost-drop. Affects only near-rank-deficient designs thatsolve_olsretains rather than NaN-out. Tracked as Low TODO.Validation
tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo(scenario 23, global) +::TestDCDHDynRParityByPathHeterogeneityWithPlacebo(scenario 22, per-path × 3 paths) pinning all 6 inference fields atBETA_RTOL=1e-6/SE_RTOL=1e-5forbeta/se/t_stat/n_obsandINFERENCE_RTOL=1e-4forp_value/conf_int.TestDCDHDynRParityHeterogeneityandTestDCDHDynRParityByPathHeterogeneityextended to assertp_value/conf_intat the newINFERENCE_RTOL=1e-4(was unpinned pre-PR; replaced the Z-vs-t deviation note with the positive parity claim).tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebocovers placebo het column population, survey-gate warn+skip, forward+survey anti-regression,out_idx<0eligibility guard, single-path telescope (path_heterogeneity_effects[(only_path,)] == heterogeneity_effectsbit-exactly), summary rendering, and direct-callNotImplementedErrorbackstop.test_*_inference_matches_safe_inference→test_*_inference_local_invariantsto verify SE-derivation wiring (t_stat = beta/se, symmetricconf_int,p_valuein[0, 1]) without back-derivingn_params.benchmarks/data/dcdh_dynr_golden_values.json, regenerable viaRscript benchmarks/R/generate_dcdh_dynr_test_values.R.Security / privacy
🤖 Generated with Claude Code