docs: add retrospective paper reviews for TROP and Wooldridge ETWFE by igerber · Pull Request #443 · igerber/diff-diff

igerber · 2026-05-15T13:41:57Z

Summary

Adds two paper-review markdown files under `docs/methodology/papers/`, following the existing template. Both reviews are retrospective documentation for estimators already shipped in the library.

`athey-2025-review.md` (358 lines) — Athey, Imbens, Qu, Viviano (2025) "Triply Robust Panel Estimators" (arXiv:2508.21536). Backs `diff_diff/trop.py`.
`wooldridge-2023-review.md` (248 lines) — Wooldridge (2023) "Simple approaches to nonlinear difference-in-differences with panel data" (Econometrics Journal 26(3), doi:10.1093/ectj/utad016). Backs `diff_diff/wooldridge.py`.

Methodology references (required if estimator / math changes)

Method name(s): TROP, WooldridgeDiD (ETWFE)
Paper / source link(s): https://arxiv.org/abs/2508.21536 ; https://doi.org/10.1093/ectj/utad016
Any intentional deviations from the source (and why): none documented in this PR (it is documentation-only and adds reviews for already-shipped estimators; deviations would be tracked in REGISTRY.md alongside the implementation)

Validation

Tests added/updated: none (docs-only)
Backtest / simulation / notebook evidence (if applicable): n/a

Security / privacy

Confirm no secrets/PII in this PR: confirmed

Both reviews follow the existing template under docs/methodology/papers/ and back already-shipped estimators (diff_diff/trop.py, diff_diff/wooldridge.py). - athey-2025-review.md — Athey, Imbens, Qu, Viviano (2025) "Triply Robust Panel Estimators" (arXiv:2508.21536) - wooldridge-2023-review.md — Wooldridge (2023) "Simple approaches to nonlinear difference-in-differences with panel data" (Econometrics Journal 26(3), doi:10.1093/ectj/utad016) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T13:48:52Z

Overall Assessment

✅ Looks good

This is a docs-only PR, so there are no unmitigated P0/P1 findings. The main issues are P2/P3 documentation accuracy problems where the new paper-review files drift from the shipped registry/code.

Executive Summary

No estimator, weighting, variance, or inference code changed in this PR, so there is no blocker-level risk in changed executable paths.
The new review files contain several implementation-facing notes that no longer match the current library contract.
wooldridge-2023-review.md has one substantive source-interpretation error around the meaning of δ₂. (academic.oup.com)
athey-2025-review.md overstates current TROP support for treatment-pattern flexibility and uses outdated public method names/descriptions.
One Wooldridge aggregation difference is already documented in REGISTRY.md / TODO.md; that should be surfaced here as a deviation note, not treated as a defect.
athey-2025-review.md also commits a contributor-local absolute filesystem path.

Methodology

Severity: P2. Impact: docs/methodology/papers/wooldridge-2023-review.md:L49-L55 says that when G(z)=exp(z), δ₂ is the “log-odds ratio (logit) or log rate ratio (Poisson).” That conflates two different link-function interpretations. Wooldridge distinguishes the exponential mean case as a log difference / proportional effect, while the logistic mean gives a change in log-odds. Concrete fix: split this sentence by link function and mirror the paper’s wording. (academic.oup.com)
Severity: P2. Impact: docs/methodology/papers/athey-2025-review.md:L274-L280 is under “Implementation Notes” but says treatment switching on/off is supported. Current TROP requires an absorbing treatment indicator and rejects non-absorbing/event-style inputs; that contract is enforced in diff_diff/trop.py:L500-L527 and diff_diff/trop_global.py:L616-L630, and documented in docs/methodology/REGISTRY.md:L2002-L2015. Concrete fix: rewrite this as a paper-scope remark or explicitly say the shipped implementation requires absorbing treatment.

Code Quality

Severity: P2. Impact: docs/methodology/papers/wooldridge-2023-review.md:L188-L205 has stale implementation notes: it lists control_group default as "never_treated" and says Poisson still needs a new solver. Current code defaults to "not_yet_treated" and already uses solve_poisson in the Poisson path (diff_diff/wooldridge.py:L302-L306, diff_diff/wooldridge.py:L1085-L1124), consistent with docs/methodology/REGISTRY.md:L1376-L1388. Concrete fix: update the tuning-parameter table and implementation notes to the shipped API, or label them as historical/pre-implementation notes.

Performance

No findings in changed scope.

Maintainability

Severity: P2. Impact: docs/methodology/papers/athey-2025-review.md:L312-L318 refers to "twostep" and "joint" methods and describes the latter as homogeneous-effect WLS. The public API is method="local" / method="global", and the global path computes residual-based treated-cell effects averaged into ATT (diff_diff/trop.py:L64-L78, diff_diff/trop_global.py:L554-L585, docs/methodology/REGISTRY.md:L2139-L2146). Concrete fix: rename these to local / global and align the description with the current estimator contract.

Tech Debt

Severity: P3. Impact: docs/methodology/papers/wooldridge-2023-review.md:L103-L108 summarizes the paper’s aggregation using cohort-share weights, which is correct for the paper, but current library behavior differs and that deviation is already documented in docs/methodology/REGISTRY.md:L1358-L1364 and tracked in TODO.md:L79-L81. Concrete fix: add a one-line “current implementation deviation” note pointing readers to the existing registry/TODO entry. (academic.oup.com)

Security

Severity: P3. Impact: docs/methodology/papers/athey-2025-review.md:L5 embeds an absolute local path (/Users/igerber/...). That is non-portable and leaks contributor-local environment details. Concrete fix: replace it with a repo-relative path such as papers/2508.21536v2.pdf or the arXiv URL.

Documentation/Tests

No separate testing finding. For a docs-only PR, the absence of new tests is acceptable; the issues above are documentation-accuracy fixes, not test gaps.

athey-2025-review.md: - Replace contributor-local absolute PDF path with arXiv URL - Note shipped TROP requires absorbing treatment (paper Eq 13 generalization is out of scope for the current implementation) - Rename "twostep"/"joint" to "local"/"global" and correct the global-method description to residual-based treated-cell effects averaged into ATT wooldridge-2023-review.md: - Split delta_2 interpretation by link function (exponential = log diff / proportional effect; logistic = change in log-odds) - Update control_group default to "not_yet_treated" (matches wooldridge.py:305) - Update implementation note: solve_poisson exists at linalg.py:3431 and is used in the Poisson path - Add aggregation deviation note linking to REGISTRY and TODO entries Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-15T14:03:22Z

/ai-review

github-actions · 2026-05-15T14:10:12Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 948ee9a169c37712f5af594180e5c173c45eb474

Overall Assessment

✅ Looks good

Executive Summary

This is still a docs-only PR, so there are no changed estimator, weighting, variance, or inference code paths that could introduce a P0/P1 executable defect.
Re-review result: the earlier substantive drift items are mostly addressed. The new reviews now fix the Wooldridge δ₂ wording, surface the Wooldridge aggregation-weight deviation as documented/tracked, state TROP’s shipped absorbing-state restriction, use current local/global TROP naming, and remove the prior contributor-local absolute path.
One unmitigated P2 remains in the TROP review: the simulation-design summary says “7 real datasets,” but both the file’s own list and the paper’s Section 3/Table 1 description point to six source datasets / an applications-vs-datasets mix-up. citeturn7view0
One unmitigated P2 remains in the Wooldridge review: its “Implementation Notes” understate current covariate support relative to the shipped API.
One minor P3 remains: the TROP review still contains an internal “Agent 2” authoring artifact.

Methodology

Severity: P2. docs/methodology/papers/athey-2025-review.md:L323-L331. Impact: the review says the simulations are based on “7 real datasets,” but the same block only enumerates six items, and the paper’s Section 3 / Table 1 description likewise names six source datasets (CPS, PWT, Germany, Basque, Smoking, Boatlift). In a methodology review, that is a direct source-material mismatch. Concrete fix: change the sentence to “6 real datasets,” or explicitly say “7 simulation applications” and split CPS logwage vs CPS urate out as separate applications. citeturn7view0

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. docs/methodology/papers/athey-2025-review.md:L351-L354. Impact: the committed review still refers to “Agent 2’s extraction...” when discussing Equation 13. That is an internal authoring artifact, not an auditable source-based note, and it weakens the document’s credibility. Concrete fix: rewrite this as a neutral source-based note (“Equation 13’s penalty term should be rechecked against the PDF”) or remove it once verified.

Tech Debt

No findings. The Wooldridge aggregation-weight difference is now correctly surfaced as a documented implementation deviation and is already tracked in docs/methodology/REGISTRY.md:L1357-L1364 and TODO.md:L79-L80.

Security

No findings. The prior contributor-local path leak is fixed; docs/methodology/papers/athey-2025-review.md:L5 now uses a public arXiv reference instead of a local absolute filesystem path.

Documentation/Tests

Severity: P2. docs/methodology/papers/wooldridge-2023-review.md:L176-L198. Impact: the “Implementation Notes” present covariates as generic, time-constant-only inputs, but the shipped estimator exposes exovar, xtvar, and xgvar, and the registry explicitly documents time-varying covariates plus cohort/time interaction handling at diff_diff/wooldridge.py:L387-L411 and docs/methodology/REGISTRY.md:L1369-L1374. That understates current implementation capability and can mislead future contributors using this review as a library-contract reference. Concrete fix: distinguish paper notation from the shipped API, or add one sentence that time-varying covariates are supported via xtvar and cohort-interacted covariates via xgvar.
No separate test-gap finding. For a docs-only PR, the absence of new tests is acceptable.

athey-2025-review.md: - Correct "7 real datasets" to "6 real datasets / 7 simulation applications" (CPS is used for both logwage and urate outcomes; paper Table 1 / Section 3 names 6 source datasets) - Rewrite Equation 13 nuclear-norm gap note as a neutral source-based check (remove authoring artifact) wooldridge-2023-review.md: - Surface shipped covariate API (exovar / xtvar / xgvar incl. time-varying via xtvar with demean_covariates default) in Data Structure Requirements and Tuning Parameters table; cross-link to wooldridge.py:394-411 and REGISTRY.md "Covariates" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T14:29:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b527f78eac40ff95f0a52710b1c012b357e29621

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: the prior open issues called out in the last AI pass appear resolved. The TROP review now fixes the dataset/application wording at docs/methodology/papers/athey-2025-review.md:L325-L331, and the Wooldridge review now documents shipped covariate support at docs/methodology/papers/wooldridge-2023-review.md:L176-L179, L198-L198.
P2: the new Wooldridge review now contradicts itself on aggregation-weight sourcing. It cites “Eqs. 7.2-7.4” as if they were in the reviewed paper at docs/methodology/papers/wooldridge-2023-review.md:L105-L111, then later says the 2023 paper gives no explicit aggregation equations at docs/methodology/papers/wooldridge-2023-review.md:L236-L240.
P2: the Wooldridge implementation notes still blur paper notation and shipped API. They imply a required W_it input and a broader bootstrap surface than WooldridgeDiD actually exposes in code.
P3 informational: the Wooldridge aggregation-weight deviation itself is now properly documented and tracked in docs/methodology/REGISTRY.md:L1357-L1364 and TODO.md:L79-L80, so that deviation is not a blocker.

Methodology

P2 docs/methodology/papers/wooldridge-2023-review.md:L105-L111, docs/methodology/papers/wooldridge-2023-review.md:L236-L240, docs/methodology/REGISTRY.md:L1363-L1364. Impact: the file currently says the reviewed 2023 paper’s aggregation weights come from “Eqs. 7.2-7.4,” but later says the 2023 paper only describes the weights conceptually and gives no explicit equations. That weakens source traceability for the deviation note and makes it unclear whether the doc is citing Wooldridge (2023) or the later Wooldridge ETWFE source already used in the registry. Concrete fix: rewrite L111 to attribute the formal cohort-share equations to the later Wooldridge ETWFE source already referenced in REGISTRY.md, or else remove the equation-number claim and say the 2023 paper gives only the conceptual cohort-share weighting.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L111, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none. The cell-weight vs cohort-share aggregation difference is now correctly surfaced as a documented deviation and tracked technical debt, so it should not hold up approval. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

P2 docs/methodology/papers/wooldridge-2023-review.md:L99-L110, docs/methodology/papers/wooldridge-2023-review.md:L176-L198, diff_diff/wooldridge.py:L165-L189, diff_diff/wooldridge.py:L394-L411, diff_diff/wooldridge.py:L432-L445, docs/methodology/REGISTRY.md:L1410-L1418. Impact: the review mixes paper notation with the shipped library contract. In the paper, W_it is part of the model notation and panel bootstrap is a valid inference route; in the implementation, users pass cohort and the design matrix derives treatment cells internally, and n_bootstrap > 0 is only supported for method="ols" and rejected with survey_design. As written, the new review overstates the current API and inference surface for WooldridgeDiD. Concrete fix: in the Implementation Notes, explicitly separate “paper notation” from “shipped API”: say users provide cohort/first_treat, W_it is constructed internally from cohort and time, and bootstrap is currently OLS-only and unavailable with survey_design.
No separate test-gap finding. For a docs-only PR, the absence of new tests is acceptable.

wooldridge-2023-review.md: - Aggregation note: stop attributing "Eqs. 7.2-7.4" to the 2023 paper (the 2023 paper describes aggregation only conceptually in Section 3.1; the formal cohort-share equations are from W2025 per REGISTRY.md) - Implementation Notes: separate paper notation from shipped API. Users provide cohort/first_treat; W_it is constructed internally from cohort+ time via _build_interaction_matrix (wooldridge.py:165-189), not passed as a column - Standard errors: add shipped-API restriction note — n_bootstrap > 0 is OLS-only (wooldridge.py:432-437) and rejected with survey_design (wooldridge.py:441-444) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T14:38:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c92c2dcf439916c13b3437dc8fff70873ebeec03

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: the prior Wooldridge documentation issues from the last AI pass appear resolved. The aggregation-weight deviation is now explicitly tied to REGISTRY.md/TODO.md, and the review now separates paper notation from the shipped cohort/bootstrap API surface.
P2 docs/methodology/papers/wooldridge-2023-review.md:L121-L129: the Section 7 extension bullets still need stronger caveats around exit and nonbinary-treatment generalizations. (academic.oup.com)
P2 docs/methodology/papers/athey-2025-review.md:L275-L281: the TROP review still blurs the paper’s balanced-panel setup with the shipped library’s unbalanced-panel support.
P3 informational: the Wooldridge aggregation-weight deviation is now properly documented and tracked, so it is not blocking.

Methodology

P2 docs/methodology/papers/wooldridge-2023-review.md:L121-L129. Impact: the new “Edge cases” bullets compress Section 7 too aggressively. In the published article, Section 7 is explicitly an extensions section; the exit case adds an extra restriction that future shocks to untreated potential outcomes cannot drive exit, and the multiple-treatment-level path is described as only relatively straightforward, not completely general, with the precise estimand left for future work. Concrete fix: rewrite the Treatment exit and Multiple treatment levels bullets as qualified extension notes and carry over those caveats. (academic.oup.com)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none. The cell-weight vs cohort-share aggregation difference is now correctly documented as a deviation and tracked technical debt. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

P2 docs/methodology/papers/athey-2025-review.md:L275-L281, docs/methodology/papers/athey-2025-review.md:L359-L359, docs/methodology/REGISTRY.md:L2113-L2137, diff_diff/trop.py:L500-L557. Impact: the TROP review’s Implementation Notes open with a balanced-panel requirement even though the same review later says the shipped implementation supports unbalanced panels, and the registry/code already document that extension. That leaves the paper-vs-library contract ambiguous for readers using this review as implementation guidance. Concrete fix: relabel the opening bullet as a paper assumption (Paper assumption: balanced panel) or add an adjacent shipped-implementation note that diff_diff/trop.py accepts unbalanced panels with structural gaps.
No separate test-gap finding. For a docs-only PR, no test changes are required.

athey-2025-review.md: - Relabel balanced-panel bullet as "Paper assumption" and add an adjacent shipped-implementation note that diff_diff/trop.py supports unbalanced panels with structural gaps (matches the later "Unbalanced panels" entry under Gaps and Uncertainties and the corresponding REGISTRY section) wooldridge-2023-review.md: - Treatment exit (Section 7.2) bullet: mark as extension and carry over the paper's additional restriction that future shocks to untreated potential outcomes cannot drive exit - Multiple treatment levels (Section 7.4) bullet: mark as extension; note the paper describes it as relatively straightforward but not fully general, leaving the precise multi-level estimand to future work Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T14:49:26Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6dbddf7d55fe61fdeced432145b9d5ba66c8d90f

Overall Assessment
✅ Looks good

Executive Summary

Methods reviewed: TROP and WooldridgeDiD (ETWFE).
Re-review result: the prior Wooldridge Section 7 caveat issue and TROP balanced-panel wording issue appear resolved in the new paper reviews. docs/methodology/papers/wooldridge-2023-review.md:L121-L129, docs/methodology/papers/athey-2025-review.md:L274-L282. (academic.oup.com)
P2 remaining: the new TROP review is still internally inconsistent on weight normalization and still carries an Eq. 13 “should be rechecked” placeholder while presenting itself as registry-copy-ready. docs/methodology/papers/athey-2025-review.md:L10-L13, docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L355, docs/methodology/REGISTRY.md:L2052-L2059, docs/methodology/REGISTRY.md:L2126-L2129. (ar5iv.org)
The Wooldridge aggregation-weight deviation remains properly documented and tracked, so it is informational only. docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80.
No P0/P1 issues found; this docs-only PR does not introduce code, inference, or security regressions.

Methodology

P2 docs/methodology/papers/athey-2025-review.md:L10-L13, docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L355, docs/methodology/REGISTRY.md:L2052-L2059, docs/methodology/REGISTRY.md:L2126-L2129. Impact: the TROP review is framed as ready to copy into the methodology registry, but it still treats 1^T ω = 1^T θ = 1 as a checklist requirement while later saying normalization is unclear, and it leaves Eq. 13’s penalty as “should be rechecked.” The source paper presents exponential weight construction in Section 2 and a separate sum-to-one condition in the theory section, so this should be documented as an unresolved source ambiguity rather than a settled implementation requirement. Concrete fix: move weight normalization out of the checklist into an explicit ambiguity/note, verify Eq. 13 once against the source, and only then keep the registry-copy-ready framing. (ar5iv.org)
No other methodology findings. The Wooldridge Section 7 caveats now match the paper’s exit/endogeneity warning and its qualified framing for multi-level treatment extensions, and the TROP notes now cleanly separate the paper setup from the shipped unbalanced-panel extension. (academic.oup.com)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the TROP documentation inconsistency above.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none; the Wooldridge aggregation-weight deviation is explicitly documented and tracked. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

No additional findings. This is a docs-only PR, so no test changes are required.

athey-2025-review.md: - Reframe Methodology Registry Entry intro from "copy into REGISTRY" ready-to-promote to a working-draft framing that explicitly defers promotion until two source-ambiguity items (weight normalization, Eq. 13 penalty form) are resolved against the source - Pull the weight-normalization line out of the Requirements Checklist (it was framed as a settled requirement); restate it as an open source-ambiguity cross-referencing Gap #5, with the current shipped implementation pinned to the Equation 2 (unnormalized) interpretation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add retrospective paper reviews for TROP and Wooldridge ETWFE#443

docs: add retrospective paper reviews for TROP and Wooldridge ETWFE#443
igerber wants to merge 6 commits into
mainfrom
docs/paper-reviews-trop-etwfe

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 15, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 15, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented May 15, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant