Skip to content

docs: add retrospective paper reviews for TROP and Wooldridge ETWFE#443

Open
igerber wants to merge 6 commits into
mainfrom
docs/paper-reviews-trop-etwfe
Open

docs: add retrospective paper reviews for TROP and Wooldridge ETWFE#443
igerber wants to merge 6 commits into
mainfrom
docs/paper-reviews-trop-etwfe

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 15, 2026

Summary

Adds two paper-review markdown files under `docs/methodology/papers/`, following the existing template. Both reviews are retrospective documentation for estimators already shipped in the library.

  • `athey-2025-review.md` (358 lines) — Athey, Imbens, Qu, Viviano (2025) "Triply Robust Panel Estimators" (arXiv:2508.21536). Backs `diff_diff/trop.py`.
  • `wooldridge-2023-review.md` (248 lines) — Wooldridge (2023) "Simple approaches to nonlinear difference-in-differences with panel data" (Econometrics Journal 26(3), doi:10.1093/ectj/utad016). Backs `diff_diff/wooldridge.py`.

Methodology references (required if estimator / math changes)

  • Method name(s): TROP, WooldridgeDiD (ETWFE)
  • Paper / source link(s): https://arxiv.org/abs/2508.21536 ; https://doi.org/10.1093/ectj/utad016
  • Any intentional deviations from the source (and why): none documented in this PR (it is documentation-only and adds reviews for already-shipped estimators; deviations would be tracked in REGISTRY.md alongside the implementation)

Validation

  • Tests added/updated: none (docs-only)
  • Backtest / simulation / notebook evidence (if applicable): n/a

Security / privacy

  • Confirm no secrets/PII in this PR: confirmed

Both reviews follow the existing template under docs/methodology/papers/
and back already-shipped estimators (diff_diff/trop.py, diff_diff/wooldridge.py).

- athey-2025-review.md — Athey, Imbens, Qu, Viviano (2025) "Triply Robust
  Panel Estimators" (arXiv:2508.21536)
- wooldridge-2023-review.md — Wooldridge (2023) "Simple approaches to
  nonlinear difference-in-differences with panel data" (Econometrics
  Journal 26(3), doi:10.1093/ectj/utad016)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

This is a docs-only PR, so there are no unmitigated P0/P1 findings. The main issues are P2/P3 documentation accuracy problems where the new paper-review files drift from the shipped registry/code.

Executive Summary

  • No estimator, weighting, variance, or inference code changed in this PR, so there is no blocker-level risk in changed executable paths.
  • The new review files contain several implementation-facing notes that no longer match the current library contract.
  • wooldridge-2023-review.md has one substantive source-interpretation error around the meaning of δ₂. (academic.oup.com)
  • athey-2025-review.md overstates current TROP support for treatment-pattern flexibility and uses outdated public method names/descriptions.
  • One Wooldridge aggregation difference is already documented in REGISTRY.md / TODO.md; that should be surfaced here as a deviation note, not treated as a defect.
  • athey-2025-review.md also commits a contributor-local absolute filesystem path.

Methodology

  • Severity: P2. Impact: docs/methodology/papers/wooldridge-2023-review.md:L49-L55 says that when G(z)=exp(z), δ₂ is the “log-odds ratio (logit) or log rate ratio (Poisson).” That conflates two different link-function interpretations. Wooldridge distinguishes the exponential mean case as a log difference / proportional effect, while the logistic mean gives a change in log-odds. Concrete fix: split this sentence by link function and mirror the paper’s wording. (academic.oup.com)
  • Severity: P2. Impact: docs/methodology/papers/athey-2025-review.md:L274-L280 is under “Implementation Notes” but says treatment switching on/off is supported. Current TROP requires an absorbing treatment indicator and rejects non-absorbing/event-style inputs; that contract is enforced in diff_diff/trop.py:L500-L527 and diff_diff/trop_global.py:L616-L630, and documented in docs/methodology/REGISTRY.md:L2002-L2015. Concrete fix: rewrite this as a paper-scope remark or explicitly say the shipped implementation requires absorbing treatment.

Code Quality

  • Severity: P2. Impact: docs/methodology/papers/wooldridge-2023-review.md:L188-L205 has stale implementation notes: it lists control_group default as "never_treated" and says Poisson still needs a new solver. Current code defaults to "not_yet_treated" and already uses solve_poisson in the Poisson path (diff_diff/wooldridge.py:L302-L306, diff_diff/wooldridge.py:L1085-L1124), consistent with docs/methodology/REGISTRY.md:L1376-L1388. Concrete fix: update the tuning-parameter table and implementation notes to the shipped API, or label them as historical/pre-implementation notes.

Performance

  • No findings in changed scope.

Maintainability

  • Severity: P2. Impact: docs/methodology/papers/athey-2025-review.md:L312-L318 refers to "twostep" and "joint" methods and describes the latter as homogeneous-effect WLS. The public API is method="local" / method="global", and the global path computes residual-based treated-cell effects averaged into ATT (diff_diff/trop.py:L64-L78, diff_diff/trop_global.py:L554-L585, docs/methodology/REGISTRY.md:L2139-L2146). Concrete fix: rename these to local / global and align the description with the current estimator contract.

Tech Debt

  • Severity: P3. Impact: docs/methodology/papers/wooldridge-2023-review.md:L103-L108 summarizes the paper’s aggregation using cohort-share weights, which is correct for the paper, but current library behavior differs and that deviation is already documented in docs/methodology/REGISTRY.md:L1358-L1364 and tracked in TODO.md:L79-L81. Concrete fix: add a one-line “current implementation deviation” note pointing readers to the existing registry/TODO entry. (academic.oup.com)

Security

  • Severity: P3. Impact: docs/methodology/papers/athey-2025-review.md:L5 embeds an absolute local path (/Users/igerber/...). That is non-portable and leaks contributor-local environment details. Concrete fix: replace it with a repo-relative path such as papers/2508.21536v2.pdf or the arXiv URL.

Documentation/Tests

  • No separate testing finding. For a docs-only PR, the absence of new tests is acceptable; the issues above are documentation-accuracy fixes, not test gaps.

athey-2025-review.md:
- Replace contributor-local absolute PDF path with arXiv URL
- Note shipped TROP requires absorbing treatment (paper Eq 13 generalization
  is out of scope for the current implementation)
- Rename "twostep"/"joint" to "local"/"global" and correct the global-method
  description to residual-based treated-cell effects averaged into ATT

wooldridge-2023-review.md:
- Split delta_2 interpretation by link function (exponential = log diff /
  proportional effect; logistic = change in log-odds)
- Update control_group default to "not_yet_treated" (matches wooldridge.py:305)
- Update implementation note: solve_poisson exists at linalg.py:3431 and is
  used in the Poisson path
- Add aggregation deviation note linking to REGISTRY and TODO entries

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 15, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 948ee9a169c37712f5af594180e5c173c45eb474


Overall Assessment

✅ Looks good

Executive Summary

  • This is still a docs-only PR, so there are no changed estimator, weighting, variance, or inference code paths that could introduce a P0/P1 executable defect.
  • Re-review result: the earlier substantive drift items are mostly addressed. The new reviews now fix the Wooldridge δ₂ wording, surface the Wooldridge aggregation-weight deviation as documented/tracked, state TROP’s shipped absorbing-state restriction, use current local/global TROP naming, and remove the prior contributor-local absolute path.
  • One unmitigated P2 remains in the TROP review: the simulation-design summary says “7 real datasets,” but both the file’s own list and the paper’s Section 3/Table 1 description point to six source datasets / an applications-vs-datasets mix-up. citeturn7view0
  • One unmitigated P2 remains in the Wooldridge review: its “Implementation Notes” understate current covariate support relative to the shipped API.
  • One minor P3 remains: the TROP review still contains an internal “Agent 2” authoring artifact.

Methodology

  • Severity: P2. docs/methodology/papers/athey-2025-review.md:L323-L331. Impact: the review says the simulations are based on “7 real datasets,” but the same block only enumerates six items, and the paper’s Section 3 / Table 1 description likewise names six source datasets (CPS, PWT, Germany, Basque, Smoking, Boatlift). In a methodology review, that is a direct source-material mismatch. Concrete fix: change the sentence to “6 real datasets,” or explicitly say “7 simulation applications” and split CPS logwage vs CPS urate out as separate applications. citeturn7view0

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3. docs/methodology/papers/athey-2025-review.md:L351-L354. Impact: the committed review still refers to “Agent 2’s extraction...” when discussing Equation 13. That is an internal authoring artifact, not an auditable source-based note, and it weakens the document’s credibility. Concrete fix: rewrite this as a neutral source-based note (“Equation 13’s penalty term should be rechecked against the PDF”) or remove it once verified.

Tech Debt

  • No findings. The Wooldridge aggregation-weight difference is now correctly surfaced as a documented implementation deviation and is already tracked in docs/methodology/REGISTRY.md:L1357-L1364 and TODO.md:L79-L80.

Security

  • No findings. The prior contributor-local path leak is fixed; docs/methodology/papers/athey-2025-review.md:L5 now uses a public arXiv reference instead of a local absolute filesystem path.

Documentation/Tests

  • Severity: P2. docs/methodology/papers/wooldridge-2023-review.md:L176-L198. Impact: the “Implementation Notes” present covariates as generic, time-constant-only inputs, but the shipped estimator exposes exovar, xtvar, and xgvar, and the registry explicitly documents time-varying covariates plus cohort/time interaction handling at diff_diff/wooldridge.py:L387-L411 and docs/methodology/REGISTRY.md:L1369-L1374. That understates current implementation capability and can mislead future contributors using this review as a library-contract reference. Concrete fix: distinguish paper notation from the shipped API, or add one sentence that time-varying covariates are supported via xtvar and cohort-interacted covariates via xgvar.
  • No separate test-gap finding. For a docs-only PR, the absence of new tests is acceptable.

athey-2025-review.md:
- Correct "7 real datasets" to "6 real datasets / 7 simulation applications"
  (CPS is used for both logwage and urate outcomes; paper Table 1 / Section 3
  names 6 source datasets)
- Rewrite Equation 13 nuclear-norm gap note as a neutral source-based check
  (remove authoring artifact)

wooldridge-2023-review.md:
- Surface shipped covariate API (exovar / xtvar / xgvar incl. time-varying
  via xtvar with demean_covariates default) in Data Structure Requirements
  and Tuning Parameters table; cross-link to wooldridge.py:394-411 and
  REGISTRY.md "Covariates"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b527f78eac40ff95f0a52710b1c012b357e29621


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review result: the prior open issues called out in the last AI pass appear resolved. The TROP review now fixes the dataset/application wording at docs/methodology/papers/athey-2025-review.md:L325-L331, and the Wooldridge review now documents shipped covariate support at docs/methodology/papers/wooldridge-2023-review.md:L176-L179, L198-L198.
  • P2: the new Wooldridge review now contradicts itself on aggregation-weight sourcing. It cites “Eqs. 7.2-7.4” as if they were in the reviewed paper at docs/methodology/papers/wooldridge-2023-review.md:L105-L111, then later says the 2023 paper gives no explicit aggregation equations at docs/methodology/papers/wooldridge-2023-review.md:L236-L240.
  • P2: the Wooldridge implementation notes still blur paper notation and shipped API. They imply a required W_it input and a broader bootstrap surface than WooldridgeDiD actually exposes in code.
  • P3 informational: the Wooldridge aggregation-weight deviation itself is now properly documented and tracked in docs/methodology/REGISTRY.md:L1357-L1364 and TODO.md:L79-L80, so that deviation is not a blocker.

Methodology

  • P2 docs/methodology/papers/wooldridge-2023-review.md:L105-L111, docs/methodology/papers/wooldridge-2023-review.md:L236-L240, docs/methodology/REGISTRY.md:L1363-L1364. Impact: the file currently says the reviewed 2023 paper’s aggregation weights come from “Eqs. 7.2-7.4,” but later says the 2023 paper only describes the weights conceptually and gives no explicit equations. That weakens source traceability for the deviation note and makes it unclear whether the doc is citing Wooldridge (2023) or the later Wooldridge ETWFE source already used in the registry. Concrete fix: rewrite L111 to attribute the formal cohort-share equations to the later Wooldridge ETWFE source already referenced in REGISTRY.md, or else remove the equation-number claim and say the 2023 paper gives only the conceptual cohort-share weighting.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 docs/methodology/papers/wooldridge-2023-review.md:L111, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none. The cell-weight vs cohort-share aggregation difference is now correctly surfaced as a documented deviation and tracked technical debt, so it should not hold up approval. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • P2 docs/methodology/papers/wooldridge-2023-review.md:L99-L110, docs/methodology/papers/wooldridge-2023-review.md:L176-L198, diff_diff/wooldridge.py:L165-L189, diff_diff/wooldridge.py:L394-L411, diff_diff/wooldridge.py:L432-L445, docs/methodology/REGISTRY.md:L1410-L1418. Impact: the review mixes paper notation with the shipped library contract. In the paper, W_it is part of the model notation and panel bootstrap is a valid inference route; in the implementation, users pass cohort and the design matrix derives treatment cells internally, and n_bootstrap > 0 is only supported for method="ols" and rejected with survey_design. As written, the new review overstates the current API and inference surface for WooldridgeDiD. Concrete fix: in the Implementation Notes, explicitly separate “paper notation” from “shipped API”: say users provide cohort/first_treat, W_it is constructed internally from cohort and time, and bootstrap is currently OLS-only and unavailable with survey_design.
  • No separate test-gap finding. For a docs-only PR, the absence of new tests is acceptable.

wooldridge-2023-review.md:
- Aggregation note: stop attributing "Eqs. 7.2-7.4" to the 2023 paper
  (the 2023 paper describes aggregation only conceptually in Section 3.1;
  the formal cohort-share equations are from W2025 per REGISTRY.md)
- Implementation Notes: separate paper notation from shipped API. Users
  provide cohort/first_treat; W_it is constructed internally from cohort+
  time via _build_interaction_matrix (wooldridge.py:165-189), not passed
  as a column
- Standard errors: add shipped-API restriction note — n_bootstrap > 0 is
  OLS-only (wooldridge.py:432-437) and rejected with survey_design
  (wooldridge.py:441-444)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c92c2dcf439916c13b3437dc8fff70873ebeec03


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review result: the prior Wooldridge documentation issues from the last AI pass appear resolved. The aggregation-weight deviation is now explicitly tied to REGISTRY.md/TODO.md, and the review now separates paper notation from the shipped cohort/bootstrap API surface.
  • P2 docs/methodology/papers/wooldridge-2023-review.md:L121-L129: the Section 7 extension bullets still need stronger caveats around exit and nonbinary-treatment generalizations. (academic.oup.com)
  • P2 docs/methodology/papers/athey-2025-review.md:L275-L281: the TROP review still blurs the paper’s balanced-panel setup with the shipped library’s unbalanced-panel support.
  • P3 informational: the Wooldridge aggregation-weight deviation is now properly documented and tracked, so it is not blocking.

Methodology

  • P2 docs/methodology/papers/wooldridge-2023-review.md:L121-L129. Impact: the new “Edge cases” bullets compress Section 7 too aggressively. In the published article, Section 7 is explicitly an extensions section; the exit case adds an extra restriction that future shocks to untreated potential outcomes cannot drive exit, and the multiple-treatment-level path is described as only relatively straightforward, not completely general, with the precise estimand left for future work. Concrete fix: rewrite the Treatment exit and Multiple treatment levels bullets as qualified extension notes and carry over those caveats. (academic.oup.com)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none. The cell-weight vs cohort-share aggregation difference is now correctly documented as a deviation and tracked technical debt. Concrete fix: none in this PR.

Security

  • No findings.

Documentation/Tests

  • P2 docs/methodology/papers/athey-2025-review.md:L275-L281, docs/methodology/papers/athey-2025-review.md:L359-L359, docs/methodology/REGISTRY.md:L2113-L2137, diff_diff/trop.py:L500-L557. Impact: the TROP review’s Implementation Notes open with a balanced-panel requirement even though the same review later says the shipped implementation supports unbalanced panels, and the registry/code already document that extension. That leaves the paper-vs-library contract ambiguous for readers using this review as implementation guidance. Concrete fix: relabel the opening bullet as a paper assumption (Paper assumption: balanced panel) or add an adjacent shipped-implementation note that diff_diff/trop.py accepts unbalanced panels with structural gaps.
  • No separate test-gap finding. For a docs-only PR, no test changes are required.

athey-2025-review.md:
- Relabel balanced-panel bullet as "Paper assumption" and add an adjacent
  shipped-implementation note that diff_diff/trop.py supports unbalanced
  panels with structural gaps (matches the later "Unbalanced panels" entry
  under Gaps and Uncertainties and the corresponding REGISTRY section)

wooldridge-2023-review.md:
- Treatment exit (Section 7.2) bullet: mark as extension and carry over the
  paper's additional restriction that future shocks to untreated potential
  outcomes cannot drive exit
- Multiple treatment levels (Section 7.4) bullet: mark as extension; note
  the paper describes it as relatively straightforward but not fully
  general, leaving the precise multi-level estimand to future work

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6dbddf7d55fe61fdeced432145b9d5ba66c8d90f


Overall Assessment
✅ Looks good

Executive Summary

  • Methods reviewed: TROP and WooldridgeDiD (ETWFE).
  • Re-review result: the prior Wooldridge Section 7 caveat issue and TROP balanced-panel wording issue appear resolved in the new paper reviews. docs/methodology/papers/wooldridge-2023-review.md:L121-L129, docs/methodology/papers/athey-2025-review.md:L274-L282. (academic.oup.com)
  • P2 remaining: the new TROP review is still internally inconsistent on weight normalization and still carries an Eq. 13 “should be rechecked” placeholder while presenting itself as registry-copy-ready. docs/methodology/papers/athey-2025-review.md:L10-L13, docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L355, docs/methodology/REGISTRY.md:L2052-L2059, docs/methodology/REGISTRY.md:L2126-L2129. (ar5iv.org)
  • The Wooldridge aggregation-weight deviation remains properly documented and tracked, so it is informational only. docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80.
  • No P0/P1 issues found; this docs-only PR does not introduce code, inference, or security regressions.

Methodology

  • P2 docs/methodology/papers/athey-2025-review.md:L10-L13, docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L355, docs/methodology/REGISTRY.md:L2052-L2059, docs/methodology/REGISTRY.md:L2126-L2129. Impact: the TROP review is framed as ready to copy into the methodology registry, but it still treats 1^T ω = 1^T θ = 1 as a checklist requirement while later saying normalization is unclear, and it leaves Eq. 13’s penalty as “should be rechecked.” The source paper presents exponential weight construction in Section 2 and a separate sum-to-one condition in the theory section, so this should be documented as an unresolved source ambiguity rather than a settled implementation requirement. Concrete fix: move weight normalization out of the checklist into an explicit ambiguity/note, verify Eq. 13 once against the source, and only then keep the registry-copy-ready framing. (ar5iv.org)
  • No other methodology findings. The Wooldridge Section 7 caveats now match the paper’s exit/endogeneity warning and its qualified framing for multi-level treatment extensions, and the TROP notes now cleanly separate the paper setup from the shipped unbalanced-panel extension. (academic.oup.com)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings beyond the TROP documentation inconsistency above.

Tech Debt

  • P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none; the Wooldridge aggregation-weight deviation is explicitly documented and tracked. Concrete fix: none in this PR.

Security

  • No findings.

Documentation/Tests

  • No additional findings. This is a docs-only PR, so no test changes are required.

athey-2025-review.md:
- Reframe Methodology Registry Entry intro from "copy into REGISTRY"
  ready-to-promote to a working-draft framing that explicitly defers
  promotion until two source-ambiguity items (weight normalization,
  Eq. 13 penalty form) are resolved against the source
- Pull the weight-normalization line out of the Requirements Checklist
  (it was framed as a settled requirement); restate it as an open
  source-ambiguity cross-referencing Gap #5, with the current shipped
  implementation pinned to the Equation 2 (unnormalized) interpretation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant