From 3c74743c8d717d35400066812ab8c4b5921ab083 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 09:41:40 -0400
Subject: [PATCH 1/8] docs: add retrospective paper reviews for TROP and
 Wooldridge ETWFE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Both reviews follow the existing template under docs/methodology/papers/
and back already-shipped estimators (diff_diff/trop.py, diff_diff/wooldridge.py).

- athey-2025-review.md — Athey, Imbens, Qu, Viviano (2025) "Triply Robust
  Panel Estimators" (arXiv:2508.21536)
- wooldridge-2023-review.md — Wooldridge (2023) "Simple approaches to
  nonlinear difference-in-differences with panel data" (Econometrics
  Journal 26(3), doi:10.1093/ectj/utad016)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md  | 358 ++++++++++++++++++
 .../papers/wooldridge-2023-review.md          | 248 ++++++++++++
 2 files changed, 606 insertions(+)
 create mode 100644 docs/methodology/papers/athey-2025-review.md
 create mode 100644 docs/methodology/papers/wooldridge-2023-review.md

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
new file mode 100644
index 00000000..767ff161
--- /dev/null
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -0,0 +1,358 @@
+# Paper Review: Triply Robust Panel Estimators
+
+**Authors:** Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano
+**Citation:** Athey, S., Imbens, G.W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. *arXiv preprint arXiv:2508.21536v2*.
+**PDF reviewed:** /Users/igerber/diff-diff/papers/2508.21536v2.pdf
+**Review date:** 2026-02-08
+
+---
+
+## Methodology Registry Entry
+
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries -- copy the `## TROP` section into the appropriate category in the registry.*
+
+## TROP
+
+**Primary source:** Athey, S., Imbens, G.W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. arXiv:2508.21536. https://arxiv.org/abs/2508.21536
+
+**Key implementation requirements:**
+
+*Assumption checks / warnings:*
+- **Assumption 1** (Section 5, pages 20-21): Factor model structure on potential outcomes: `Y_it(0) = L_it + epsilon_it` where `L_it = Gamma_i^T Lambda_t` and `E[epsilon_it | L] = 0`. K (number of factors) is arbitrary and may grow with sample size.
+- **Assumption 2** (page 22): Regression adjustment estimator class satisfies `E[L_hat | L] = Gamma (I + B) Lambda^T` where B is a K x K bias matrix. Satisfied by PCA/truncated SVD, nuclear-norm penalized least squares, and general singular-value shrinkage estimators. B = 0 means correctly specified.
+- **Assumption 3** (Appendix, page 36): Factor model with covariates: `Y_it(0) = R_it + X_it beta + epsilon_it` where `R_it = Gamma Lambda^T`.
+- **Assumption 4** (Appendix, page 36): Estimator class with covariates admits bias decomposition into factor bias, coefficient bias, and vanishing approximation error.
+- Block treatment assignment: `W_it = 1{i > N_0} * 1{t > T_0}` (Assumption 1(i), expositional; general assignment patterns supported via Section 6).
+- No spillovers and no dynamic effects (Section 5.3).
+- Weight estimation error negligible when `N_0 >> N_1` and `T_0 >> T_1` (Section 5.3).
+- Weights sufficiently dispersed: `||theta||_2, ||omega||_2 = o(sigma_NT)` (Section 5.3).
+
+*Target parameter (Equation 1):*
+
+```
+           sum_{i=1}^{N} sum_{t=1}^{T} W_it (Y_it(1) - Y_it(0))
+tau = ---------------------------------------------------------------
+              sum_{i=1}^{N} sum_{t=1}^{T} W_it
+```
+
+This is the Average Treatment Effect on the Treated (ATT), averaged over all treated unit-period pairs.
+
+*Working model (Section 2.2, page 6):*
+
+```
+Y_it(0) = alpha_i + beta_t + L_it + epsilon_it,    E[epsilon_it | L] = 0
+```
+
+where:
+- `alpha_i` = unit fixed effects
+- `beta_t` = time fixed effects
+- `L_it = Gamma_i^T Lambda_t` = low-rank factor structure component
+- `epsilon_it` = idiosyncratic shocks
+
+*Estimator equation -- single treated unit (Equation 2, page 6):*
+
+```
+(alpha_hat, beta_hat, L_hat) = arg min_{alpha, beta, L}
+    sum_{j=1}^{N} sum_{s=1}^{T} theta_s^{i,t} omega_j^{i,t} (1 - W_{js}) (Y_{js} - alpha_j - beta_s - L_{js})^2
+    + lambda_nn ||L||_*
+```
+
+where `||L||_*` is the nuclear norm of L. The treatment effect estimate is then:
+
+```
+tau_hat_it = Y_it - alpha_hat_i - beta_hat_t - L_hat_it
+```
+
+*Per-observation weights (Equation 3, page 7):*
+
+Time weights (exponential decay in temporal distance):
+```
+theta_s^{i,t}(lambda) = exp(-lambda_time * dist_time(s, t))
+```
+where `dist_time(s, t) = |t - s|`.
+
+Unit weights (exponential decay in unit distance):
+```
+omega_j^{i,t}(lambda) = exp(-lambda_unit * dist_unit_{-t}(j, i))
+```
+where the unit distance is the RMSE of outcome differences over shared control periods:
+```
+dist_unit_{-t}(j, i) = sqrt(
+    sum_{u=1}^{T} 1{u != t} (1 - W_iu)(1 - W_ju)(Y_iu - Y_ju)^2
+    / sum_{u=1}^{T} 1{u != t} (1 - W_iu)(1 - W_ju)
+)
+```
+
+*General case -- multiple treated units (Equation 13, Section 6.1, pages 26-27):*
+
+Each treated unit-period (i, t) is estimated individually, as if it were the only treated unit:
+
+```
+tau_hat_it(lambda) = arg min_tau min_{alpha, beta, L}
+    sum_{j=1}^{N} sum_{s=1}^{T}
+    [(1 - W_{js}) + W_{js} I_{js}^{i,t}] omega_j^{i,t}(lambda) theta_s^{i,t}(lambda)
+    (Y_{js} - alpha_j - beta_s - L_{js} - tau_it * I_{js}^{i,t})^2
+    + lambda_nn ||L||_*
+```
+
+where `I_{js}^{i,t} = 1{(j,s) = (i,t)}`. The ATT is then:
+
+```
+tau_hat = (1 / sum_{i,t} W_it) * sum_{i,t} W_it tau_hat_it(lambda_hat)
+```
+
+*Balancing representation (Equation 10, page 22):*
+
+The estimated counterfactual can be decomposed as:
+```
+tau_hat_TROP(0, theta, omega) = L_hat_NT
+    + sum_{t<=T_0} theta_t (Y_Nt - L_hat_Nt)
+    + sum_{i<=N_0} omega_i (Y_iT - L_hat_iT)
+    - sum_{t<=T_0} sum_{i<=N_0} theta_t omega_i (Y_it - L_hat_it)
+```
+
+*With covariates (Equation 14, Section 6.2, page 28):*
+
+Parametrize `L_{js} = X_{js} beta_coef + R_{js}` where R is low-rank. The objective becomes:
+
+```
+tau_hat_{i*,t*} = arg min_tau min_{alpha, beta, beta_coef, R}
+    sum_{j=1}^{N} sum_{s=1}^{T} theta_s^{i*,t*} omega_j^{i*,t*}
+    (Y_{js} - alpha_j - beta_s - X_{js} beta_coef - R_{js} - tau * W_{js})^2
+    + lambda_nn ||R||_*
+```
+
+This relaxes the low-rank assumption on L, requiring only that the residual after covariates is low-rank.
+
+*Special cases (Section 2.2, page 6):*
+- `lambda_nn = infinity`, `omega_j = theta_s = 1` (uniform weights, no factor model) --> DID/TWFE
+- `omega_j = theta_s = 1`, `lambda_nn < infinity` (uniform weights, with factor model) --> Matrix Completion (MC)
+- `lambda_nn = infinity` with specific choices of unit and time weights --> SC and SDID
+
+*Standard errors (Section 5.3, pages 25-26):*
+- Default: Bootstrap variance estimation (Algorithm 3, page 27)
+- **No analytical standard errors** for the general case
+- For single treated unit: inference driven by variance of idiosyncratic shock `sigma_NT^2` (multiple procedures available in literature)
+- Bootstrap: Non-parametric (pairs) stratified bootstrap, resampling entire unit-level rows
+- Clustering: Implicitly at the unit level (resamples full unit trajectories)
+- Weight distribution: Not applicable (pairs bootstrap, not multiplier bootstrap)
+- Recommended iterations: Not explicitly stated; simulations use 1000 replications for Monte Carlo RMSE
+
+*Triple robustness property (Theorem 5.1, page 23):*
+
+Under Assumption 1, for fixed (non-data-dependent) weights theta, omega:
+```
+|E[tau_hat - tau | L]| <= ||Delta_u(omega, Gamma)||_2 * ||Delta_t(theta, Lambda)||_2 * ||B||_*
+```
+where:
+- `Delta_u(omega, Gamma) = Gamma_bar_0(omega) - Gamma_N` (unit imbalance in loadings)
+- `Delta_t(theta, Lambda) = Lambda_bar_0(theta) - Lambda_T` (time imbalance in factors)
+- `||B||_*` = spectral norm of regression adjustment bias matrix
+
+The bias is bounded by the **product** of three components. Strictly tighter than bounds for DID, SC, and SDID.
+
+*Corollary 1 (page 23):*
+
+Estimator is unbiased (`E[tau_hat - tau | L] = 0`) if **any one** of:
+- (a) Unit balance: `sum_{i<=N_0} omega_i Gamma_i = Gamma_N`
+- (b) Time balance: `sum_{s<=T_0} theta_s Lambda_s = Lambda_T`
+- (c) Correct regression adjustment: `B = 0_K`
+
+*Bias comparison with existing estimators (Equation 11, Section 5.2, page 24):*
+```
+B_DID  = (Gamma_N - Gamma_bar_0)^T (Lambda_T - Lambda_bar_0)       [no robustness]
+B_SC   = (Gamma_bar_0(omega) - Gamma_N)^T Lambda_T                  [singly robust]
+B_SDID = (Gamma_bar_0(omega) - Gamma_N)^T (Lambda_bar_0(theta) - Lambda_T)  [doubly robust]
+B_TROP <= ||unit_imbalance||_2 * ||time_imbalance||_2 * ||regression_bias||_*  [triply robust]
+```
+
+*Triple robustness with covariates (Theorem 8.1, Appendix, pages 36-37):*
+
+Under Assumptions 3 and 4:
+```
+|E[tau_hat_TROP - Y_NT(0) | X, R]| <= ||Delta_u||_2 * ||B||_* * ||Delta_t||_2
+                                      + ||g(theta, omega)||_2 * ||delta_beta||_2 + o(eta)
+```
+where `g(theta, omega) = sum_{i,t} M(theta, omega)_it X_it` is the covariate contrast vector and `delta_beta = E[beta_hat | X, R] - beta` is the coefficient estimation bias.
+
+*Convergence rates (Section 5.3, page 25):*
+- Weight estimation error: `epsilon_bar_0(theta, omega) = O_p(min{||theta||_2, ||omega||_2})`
+- When weights sufficiently dispersed: `tau_hat - E[tau | L] = epsilon_NT(1) + o_p(sigma_NT^2)`
+- With multiple post-treatment periods and CLT: `epsilon_NT(1) ~ N(0, sigma_NT^2)`
+
+*Edge cases:*
+- Single treated unit, single period: TROP outperforms competitors in about half of cases, within 25% in others (Table 2)
+- Very few control units (N_co ~ 10): TROP, SDID, and DIFP perform similarly; advantage emerges with larger control pools (Figure 2)
+- No interactive effects (only additive FE): DID performs competitively with TROP (Table 3, "No M" row); TROP adapts without underperforming
+- No additive fixed effects (only interactive): SC performs slightly better in some specs (Table 3, "No F" row)
+- Only noise (no factors): MC and DID can slightly outperform TROP due to regularization overhead (Table 3, "Only Noise" row)
+- Non-random treatment assignment: TWFE shows noticeable bias; TROP and SDID show substantially smaller/no bias (Appendix Table A2)
+- Bootstrap validity requires growing number of treated units (Algorithm 3)
+- Increasing pre-treatment periods can increase bias for DID-type estimators when treatment is at end of panel (Figure 2 right)
+
+*LOOCV tuning parameter selection (Equations 4-5, pages 7-8):*
+
+For each control observation (i, t) with W_it = 0, compute pseudo-treatment effect:
+```
+tau_hat_it^{loocv}(lambda) = arg min_tau min_{alpha, beta, L}
+    sum_{j,s: W_{js}=0} omega_j^{i,t}(lambda) theta_s^{i,t}(lambda)
+    (Y_{js} - alpha_j - beta_s - L_{js} - tau * I_{js}^{i,t})^2
+    + lambda_nn ||L||_*                                                   (Equation 4)
+```
+
+Select tuning parameters by minimizing:
+```
+Q(lambda) = sum_{i=1}^{N} sum_{t=1}^{T} (1 - W_it) (tau_hat_it(lambda))^2   (Equation 5)
+```
+
+Practical LOOCV procedure (footnote 2, page 8):
+1. Fix `lambda_nn = infinity`, `lambda_unit = 0`; optimize `lambda_time` over grid
+2. Fix `lambda_time = optimal`, `lambda_nn = infinity`; optimize `lambda_unit` over grid
+3. Fix `lambda_time = optimal`, `lambda_unit = optimal`; optimize `lambda_nn` over grid
+4. Use these as upper bounds for a finer joint grid
+5. Cycle through parameters, updating each while holding others at most recent optima
+
+*Algorithm 1 -- Single treated unit (page 9):*
+```
+Input: Grid G for (lambda_time, lambda_unit, lambda_nn), treatments W, outcomes Y
+1. For each lambda in G:
+   a. For each (i,t) with W_it = 0, estimate tau_hat_it(lambda) via Equation (4)
+   b. Compute Q(lambda) via Equation (5)
+2. Find lambda_hat = arg min_{lambda in G} Q(lambda)
+3. Compute tau_hat(lambda_hat) via Equation (2) with selected lambda_hat
+```
+
+*Algorithm 2 -- Multiple treated units (page 27):*
+```
+Input: Grid G for (lambda_time, lambda_unit, lambda_nn), treatments W, outcomes Y
+1. For each lambda in G:
+   a. For each (i,t) with W_it = 0, estimate tau_hat_it(lambda) via Equation (13)
+   b. Compute Q(lambda) = sum_{i,t} (1 - W_it)(tau_hat_it(lambda))^2
+2. Find lambda_hat = arg min_{lambda in G} Q(lambda)
+3. Compute tau_hat = (1/sum W_it) * sum_{i,t} W_it tau_hat_it(lambda_hat)
+```
+
+*Algorithm 3 -- Bootstrap variance estimation (page 27):*
+```
+Input: Y, W, B (number of bootstrap iterations)
+1. For b = 1 to B:
+   a. Construct bootstrap dataset (Y^(b), W^(b)) by:
+      - Sampling N_0 rows of (Y, W) WITH REPLACEMENT from control units
+      - Sampling N_1 rows of (Y, W) WITH REPLACEMENT from treated units
+   b. Compute TROP estimator tau_hat^(b) from (Y^(b), W^(b))
+2. Variance estimator:
+   V_hat_tau = (1/B) sum_{b=1}^{B} (tau_hat^(b))^2 - ((1/B) sum_{b=1}^{B} tau_hat^(b))^2
+```
+
+Note: Stratified bootstrap -- control and treated units resampled separately. Preserves within-unit temporal correlation. Validity requires growing number of treated units.
+
+**Reference implementation(s):**
+- Authors' replication code (forthcoming as of review date)
+- No specific software package mentioned in the paper
+- Simulation designs reference Arkhangelsky et al. (2019) SDID replication infrastructure
+- diff-diff: `diff_diff/trop.py` (existing implementation)
+
+**Requirements checklist:**
+- [ ] Factor model estimated via nuclear-norm penalized least squares with soft-threshold SVD (Equation 2)
+- [ ] Unit weights: `exp(-lambda_unit * RMSE_distance)` per Equation 3
+- [ ] Time weights: `exp(-lambda_time * |t - s|)` per Equation 3
+- [ ] LOOCV implemented for tuning parameter selection via Equation 5
+- [ ] LOOCV uses SUM of squared pseudo-treatment effects on control observations
+- [ ] Coordinate-wise grid search followed by cycling (footnote 2)
+- [ ] Per-observation treatment effect estimation for multiple treated units (Equation 13, Algorithm 2)
+- [ ] ATT averages over all treated unit-period pairs with equal weight
+- [ ] Stratified bootstrap preserving unit-level structure (Algorithm 3)
+- [ ] Covariate extension supported (Equation 14)
+- [ ] Special cases recoverable: DID, MC, SC, SDID via tuning parameters
+- [ ] Weight normalization: `1^T omega = 1^T theta = 1` (Section 5, page 20)
+- [ ] Heterogeneous treatment effects supported via per-observation estimation (Remark 6.1)
+
+---
+
+## Implementation Notes
+
+### Data Structure Requirements
+- **Balanced panel data**: N units x T time periods
+- **Outcome matrix**: Y (N x T), observed outcomes
+- **Treatment matrix**: W (N x T), binary treatment assignments where `W_it in {0, 1}`
+- **Covariates** (optional): X_it, observed covariates for each unit-period pair
+- Treatment must be an absorbing state for standard block assignment (W_it = 1{i > N_0} * 1{t > T_0})
+- General assignment patterns (treatment switching on/off) supported via Equation 13
+
+### Computational Considerations
+- **Main bottleneck**: LOOCV grid search -- for each grid point, every control observation requires a separate nuclear-norm penalized weighted least squares solve
+- **Per-observation estimation**: With multiple treated units, each `tau_hat_it` estimated separately (Algorithm 2). Computationally expensive with many treated observations.
+- **Coordinate-wise search** (footnote 2) reduces grid search from cubic to approximately linear in grid points per parameter
+- **Unit distance matrix**: Computing pairwise RMSE distances between all units requires O(N^2 T) operations. Can be parallelized (existing Rust acceleration: `compute_unit_distance_matrix()`, 4-8x speedup).
+- **LOOCV parallelization**: Each control observation's pseudo-treatment effect is independent, enabling parallelization across observations and grid points (existing Rust acceleration: `loocv_grid_search()`, 10-50x speedup).
+- **Bootstrap parallelization**: Each bootstrap replicate is independent (existing Rust acceleration: `bootstrap_trop_variance()`, 5-15x speedup).
+
+### Tuning Parameters
+
+| Parameter | Type | Default | Selection Method |
+|-----------|------|---------|-----------------|
+| `lambda_time` | float >= 0 | Data-driven via LOOCV | Exponential decay rate for time weights. 0 = uniform time weights. |
+| `lambda_unit` | float >= 0 | Data-driven via LOOCV | Exponential decay rate for unit weights. 0 = uniform unit weights. |
+| `lambda_nn` | float > 0 or infinity | Data-driven via LOOCV | Nuclear norm penalty. infinity = no factor model (DID/TWFE). |
+
+Empirical guidance from Table 2 (cross-validated values, T_post=10, N_tr=10):
+
+| Dataset | lambda_unit | lambda_time | lambda_nn |
+|---------|-------------|-------------|-----------|
+| CPS logwage | 0 | 0.1 | 0.9 |
+| CPS urate | 1.6 | 0.35 | 0.011 |
+| PWT | 0.3 | 0.4 | 0.006 |
+| Germany | 1.2 | 0.2 | 0.011 |
+| Basque | 0 | 0.35 | 0.006 |
+| Smoking | 0.25 | 0.4 | 0.011 |
+| Boatlift | 0.2 | 0.2 | 0.151 |
+
+Key finding (Section 4.3, pages 18-19): Substantial heterogeneity across applications in optimal tuning parameters. Removing regression adjustment (lambda_nn = infinity) or time weights (lambda_time = 0) increases RMSE significantly (up to 85% and 91% respectively). Removing unit weights (lambda_unit = 0) increases RMSE only modestly (max 10%).
+
+### Relation to Existing diff-diff Estimators
+- **Direct implementation**: `diff_diff/trop.py` implements the TROP estimator with both "twostep" (per-observation, Algorithm 2) and "joint" (WLS, homogeneous effect) methods.
+- **Existing Rust backend**: `rust/src/trop.rs` provides accelerated distance matrix computation, LOOCV grid search, and bootstrap variance estimation.
+- **Encompasses SyntheticDiD**: TROP with `lambda_nn = infinity` and specific weight choices reduces to SDID (`diff_diff/synthetic_did.py`).
+- **Encompasses TWFE**: TROP with `lambda_nn = infinity` and uniform weights reduces to DID/TWFE (`diff_diff/twfe.py`).
+- **Factor estimation**: The nuclear-norm penalized SVD in TROP is related to factor model estimation used in `SyntheticDiD`.
+- **Bootstrap structure**: The stratified pairs bootstrap (Algorithm 3) differs from the multiplier bootstrap used in `CallawaySantAnna` and `SunAbraham` -- it resamples entire unit rows rather than applying random weights.
+
+---
+
+## Simulation Design Details
+
+The paper uses semi-synthetic simulations (Section 3.1, pages 9-11) based on 7 real datasets:
+1. CPS (Current Population Survey): N=50, T=40
+2. PWT (Penn World Table): N=111, T=48
+3. Germany reunification: N=17, T=44
+4. Basque country: N=18, T=43
+5. Smoking: N=39, T=31
+6. Boatlift: N=44, T=19
+
+**Outcome DGP (Equation 6):** Rank-4 factor model with AR(2) autocorrelated errors, normalized to mean zero and unit variance. True treatment effects are zero (placebo studies).
+
+**Treatment assignment (Equation 8):** Logistic regression on latent factors, inducing confoundedness.
+
+**Key results (Table 1):** TROP outperforms all competitors in 20 of 21 specifications. Competitor RMSE inflation vs. best: DID up to 900%, SC up to 580%, MC up to 300%, SDID up to 90%.
+
+---
+
+## Gaps and Uncertainties
+
+1. **Analytical standard errors not provided**: The paper provides bootstrap inference (Algorithm 3) but no closed-form standard errors for the general case. For single treated units, the paper references "existing literature" for variance estimation without specifying which procedures (Section 5.3, page 25).
+
+2. **Bootstrap iteration count**: The paper does not recommend a specific number of bootstrap iterations. Simulations use 1000 replications for Monte Carlo RMSE, but this is for the simulation study, not a bootstrap recommendation (Section 3, page 10).
+
+3. **Efficient estimation under homogeneity**: Remark 6.1 (page 27) notes that under homogeneous treatment effects, alternative weighting of per-unit estimates could improve precision, but leaves this to future research.
+
+4. **Computational complexity**: Not explicitly discussed. The LOOCV grid search is described as the bottleneck, but no formal complexity analysis is provided.
+
+5. **Weight normalization**: Section 5 (page 20) states weights sum to one (`1^T omega = 1^T theta = 1`), but the weight specification in Equation 3 (page 7) uses unnormalized exponential weights. It is unclear whether normalization is applied before or after the optimization, or whether the theoretical results in Section 5 assume normalized weights while the practical algorithm uses unnormalized weights. The existing diff-diff implementation uses unnormalized weights in the optimization (matching Equation 2).
+
+6. **Nuclear norm penalty in Equation 13**: Agent 2's extraction of Equation 13 writes `lambda_nn ||L||_*^2` (squared nuclear norm) while Agent 1 and the Equation 2 formulation write `lambda_nn ||L||_*` (unsquared). The paper likely uses the unsquared nuclear norm consistently (matching Equation 2), but this should be verified against pages 26-27 of the PDF.
+
+7. **General assignment patterns**: Section 6.1 generalizes beyond block assignment, but the inference results (Section 5.3) and theoretical guarantees (Theorem 5.1) are stated under block assignment. The extent to which theoretical results carry over to general patterns is not fully characterized.
+
+8. **Rank selection**: The paper uses a fixed rank-4 approximation in simulations (Equation 6) but the theoretical framework allows arbitrary K. No guidance is provided for choosing K in practice beyond the nuclear norm penalty (which implicitly selects rank via soft-thresholding).
+
+9. **Unbalanced panels**: The paper assumes balanced panel data (N units x T periods). Extension to unbalanced panels is not discussed but is supported in the existing diff-diff implementation.
diff --git a/docs/methodology/papers/wooldridge-2023-review.md b/docs/methodology/papers/wooldridge-2023-review.md
new file mode 100644
index 00000000..ddc1f866
--- /dev/null
+++ b/docs/methodology/papers/wooldridge-2023-review.md
@@ -0,0 +1,248 @@
+# Paper Review: Simple approaches to nonlinear difference-in-differences with panel data
+
+**Authors:** Jeffrey M. Wooldridge
+**Citation:** Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. *The Econometrics Journal*, 26(3), C31–C66. https://doi.org/10.1093/ectj/utad016
+**PDF reviewed:** papers/wooldridge.pdf
+**Review date:** 2026-03-21
+
+---
+
+## Methodology Registry Entry
+
+*Formatted to match docs/methodology/REGISTRY.md structure.*
+
+## WooldridgeDiD (ETWFE)
+
+**Primary source:** Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. *The Econometrics Journal*, 26(3), C31–C66. https://doi.org/10.1093/ectj/utad016
+
+**Secondary source:** Wooldridge, J. M. (2021). Two-way fixed effects, the two-way Mundlak regression, and difference-in-differences estimators. SSRN Working Paper. https://doi.org/10.2139/ssrn.3906345
+
+**Key implementation requirements:**
+
+*Assumption checks / warnings:*
+- **No Anticipation (NA):** `E[Y_1(1) - Y_1(0) | D = 1] = 0` (Eq 2.2). Pre-treatment outcomes unaffected by future treatment.
+- **Linear Parallel Trends (LPT):** `E[Y_2(0)|D] - E[Y_1(0)|D] = γ_2` (Eq 2.5). Trend in untreated PO is the same across groups. Only valid for continuous/unbounded outcomes.
+- **Index Parallel Trends (IPT):** For a known strictly increasing `G(·)`, `E[Y_t(0)|D] = G(α + βD + γ_t)` (Eq 2.6-2.7). PT holds on the index inside `G(·)`, not on levels. This is the key nonlinear extension.
+- **Conditional NA, Staggered (CNAS):** `E[Y_t(g)|D_g = 1, X] = E[Y_t(∞)|D_g = 1, X]` for `t < g` (Eq 3.3).
+- **Conditional IPT, Staggered (CIPTS):** `E[Y_t(∞)|D_q,...,D_T, X] = G(α + Σ β_g D_g + Xκ + Σ (D_g · X) η_g + γ_t + Xπ_t)` (Eq 3.4). The change in the index does not depend on cohort assignment `D`, conditional on `X`.
+- **Overlap:** `P(D = 1|X = x) < 1` for all `x ∈ Supp(X)`. Equivalently, `Supp(X|D=1) ⊂ Supp(X|D=0)`.
+
+*Target parameter — two-period case (Equation 2.1):*
+
+    τ_2 = E[Y_2(1) - Y_2(0) | D = 1]
+
+*Target parameter — staggered case (Equation 3.2):*
+
+    τ_{gr} = E[Y_r(g) - Y_r(∞) | D_g = 1], r = g,...,T; g = q,...,T
+
+where `g` is the cohort (first treatment period), `r` is calendar time, `D_g` are cohort indicators, `D_∞ = 1` indicates never-treated.
+
+*Main estimator — two-period, no covariates (Equation 2.17):*
+
+    τ̂_2 = Ȳ_12 - G(α̂ + β̂ + γ̂_2)
+         = Ȳ_12 - G(G⁻¹(Ȳ_11) + (G⁻¹(Ȳ_02) - G⁻¹(Ȳ_01)))
+
+where `Ȳ_{dt}` is the sample average for group `d` in period `t`. Special cases:
+- Linear (`G(z) = z`): reduces to standard DiD `(Ȳ_12 - Ȳ_11) - (Ȳ_02 - Ȳ_01)` (Eq 2.18)
+- Exponential (`G(z) = exp(z)`): `τ̂_2 = Ȳ_12 - Ȳ_11 · (Ȳ_02/Ȳ_01)` (Eq 2.19)
+
+*Index-scale effect (Equation 2.20):*
+
+    δ_2 ≡ G⁻¹(E[Y_2(1)|D=1]) - G⁻¹(E[Y_2(0)|D=1])
+        = G⁻¹(E[Y_2(1)|D=1]) - (α + β + γ_2)
+
+When `G(z) = exp(z)`, `δ_2` is the log-odds ratio (logit) or log rate ratio (Poisson).
+
+*Imputation estimator — staggered with covariates (Procedure 1, Eq 3.10-3.11):*
+
+    Step 1: Using W_{it} = 0 observations, estimate (α, β_q,...,β_T, κ, η_q,...,η_T, γ_2,...,γ_T, π_2,...,π_T) by pooled QMLE in the LEF.
+
+    Step 2: For cohort g, impute counterfactual:
+        Ŷ_{igr}(∞) ≡ G(α̂ + β̂_g + X_i κ̂ + X_i η̂_g + γ̂_r + X_i π̂_r), r = g,...,T     (3.10)
+
+    Step 3: ATT estimator:
+        τ̂_{gr} = N_g⁻¹ Σ_i D_{ig} [Y_{ir} - Ŷ_{igr}(∞)]                                (3.11)
+
+*Pooled QMLE estimator — staggered with covariates (Procedure 2, Eq 3.12-3.15):*
+
+    E(Y_{it}|D_{iq},...,D_{iT}, X_i, W_i) = G[α + Σ β_g D_{ig} + X_i κ + Σ (D_{ig} · X_i) η_g
+        + Σ γ_s f s_t + Σ (f s_t · X_i) π_s
+        + ΣΣ δ_{gs} (W_{it} · D_{ig} · f s_t)
+        + ΣΣ (W_{it} · D_{ig} · f s_t · X̂_{ig}) ξ_{gs}]                                  (3.12)
+
+where `X̂_{ig} = X_i - X̄_g` are cohort-centred covariates, `W_{it} = D_{ig} · (f q_t + ... + f T_t)` is the time-varying treatment indicator, and `f s_t` are time period dummies.
+
+    ATT estimator from pooled method:
+        τ̃_{gr} = N_g⁻¹ Σ_i D_{ig} [G(α̃ + β̃_g + X_i κ̃ + X_i η̃_g + γ̃_r + X_i π̃_r + δ̃_{gr} + X̂_{ig} ξ̃_{gr})
+            - G(α̃ + β̃_g + X_i κ̃ + X_i η̃_g + γ̃_r + X_i π̃_r)]                         (3.15)
+
+This is the Average Structural Function (ASF) approach: `ATT(g,r) = mean_i[G(η_i + δ_{gr}) - G(η_i)]`.
+
+*Equivalence result (Proposition 3.1):*
+
+When `G⁻¹(·)` is the canonical link function for the chosen LEF density, and the pooled QMLE solution is unique:
+- Parameter estimates from imputation (Procedure 1) and pooled (Procedure 2) are identical
+- ATT estimates are identical: `τ̃_{gr} = τ̂_{gr}`
+
+This holds for: OLS (identity link), logit (logistic link with Bernoulli), Poisson QMLE (log link with Poisson). Proven in Appendix A.
+
+*Common timing simplification (Equation 3.17):*
+
+When all units are treated at the same time `g`, the model simplifies to:
+
+    E(Y_{it}|D_i, X_i, W_i) = G[α + βD_i + X_i κ + (D_i · X_i) η
+        + Σ γ_s f s_t + Σ (f s_t · X_i) π_s
+        + Σ δ_s (W_{it} · D_i · f s_t) + Σ (W_{it} · D_i · f s_t · X̂_{ig}) ξ_{gs}]       (3.17)
+
+*Standard errors (Section 3, p. C47):*
+- Delta method for individual `τ̂_{gr}` and aggregated ATTs (Wooldridge 2010, problem 12.17)
+- Panel bootstrap (resampling units) is also valid — allows for serial dependence and model misspecification
+- Cluster-robust SEs at unit level for pooled estimation
+- For nonlinear models: standard sandwich `(X'WX)⁻¹ meat (X'WX)⁻¹` where `W = diag(μ_i(1-μ_i))` for logit or `W = diag(μ_i)` for Poisson
+
+*Aggregation (Section 3.1, p. C50):*
+- **Simple (static):** weighted average of immediate effects `τ̃_{sg}` across cohorts `g = q,...,T`, weights = cohort proportions among all eventually treated
+- **Dynamic (event study):** weighted average of `τ̃_{g,g+h}` for horizon `h ≥ 0` across cohorts, weights = cohort proportions
+- **Calendar time:** weighted average across cohorts for each calendar period
+- **Group:** weighted average across periods for each cohort
+- SEs for all aggregations via delta method or panel bootstrap
+
+*Testing parallel trends (Section 4):*
+
+Two approaches:
+1. **Event-study test (Section 4.1):** Include pre-treatment interactions `D_{ig} · f s_t` for `s = {2,...,g-1}` and test joint significance. With canonical link, test is the same whether using `W_{it} = 0` subsample or pooling all observations.
+2. **Heterogeneous linear trends (Section 4.2):** Add cohort-specific linear trends `D_{ig} · t`. Conserves degrees of freedom vs event-study approach. Tests via cluster-robust Wald statistic.
+- Adding cohort-specific linear trends `D_{ig} · t` to the expanded equation (Eq 4.1) provides a sensible correction when PT is violated
+
+*Edge cases:*
+- **Binary outcomes:** Use logistic function `G(z) = Λ(z) = exp(z)/[1 + exp(z)]` with Bernoulli QLL. Fractional outcomes use same setup (Papke & Wooldridge 1996, 2008).
+- **Count/nonnegative outcomes:** Use exponential `G(z) = exp(z)` with Poisson QLL. Does not require Poisson distribution — only exponential conditional mean.
+- **Corner solutions (`Y ≥ 0` with mass at zero):** Poisson QMLE still valid. Exponential mean handles zeros naturally (p. C56-C57).
+- **All units eventually treated (Section 7.1):** Methods work with minor modification. Last-treated cohort (`g = T`) serves as control. ATTs defined relative to `Y_t(T)` rather than `Y_t(∞)`.
+- **Treatment exit (Section 7.2):** Expand cohort definition to `D_{gh}` where `g` = start period, `h` = end period. ATTs `τ_{ghr}` defined for `r = g,...,T`.
+- **Time-varying covariates (Section 7.3):** Replace `X_i` with `X_{it}` in Eq 3.12. Only valid if covariates are not influenced by the intervention.
+- **Multiple treatment levels (Section 7.4):** Replace binary `W_{it}` with set of treatment level indicators. Define cohorts by `D_{iga}` (first period `g`, initial level `a`).
+- **Incidental parameters:** For logit, unit-specific dummies cause incidental parameters problem. Pooled QMLE with cohort dummies (not unit dummies) avoids this. For Poisson with exponential mean, FE Poisson estimator does NOT suffer from incidental parameters (Wooldridge 1999) — this is a unique exception.
+- **No never-treated group:** When `g = T` is the last cohort, it serves as control. Cannot estimate ATT for this last cohort.
+
+*Algorithm (Procedure 2 — Pooled Estimation, recommended):*
+1. Pool all observations (treated and untreated, all periods)
+2. Construct design matrix: cohort dummies `D_{ig}`, time dummies `f s_t`, covariates `X_i`, cohort×covariate interactions `D_{ig} · X_i`, time×covariate interactions `f s_t · X_i`, treatment indicators `W_{it} · D_{ig} · f s_t`, treatment×centred-covariate interactions `W_{it} · D_{ig} · f s_t · X̂_{ig}`
+3. Estimate by pooled QMLE in the LEF (OLS for linear, Bernoulli QLL for logit, Poisson QLL for exponential)
+4. For each treated cell (g, r): compute ASF-based ATT via Eq 3.15 — average `G(η̂_i + δ̂_{gr}) - G(η̂_i)` over units in cohort `g`
+5. For linear case: `δ̂_{gr}` coefficients on treatment interactions are directly the ATTs
+6. Aggregate as desired (simple, dynamic/event-study, calendar, group)
+7. SEs via delta method or panel bootstrap (cluster at unit level)
+
+**Table 1. Canonical link and log likelihood pairings (p. C44):**
+
+| Conditional Mean | LEF Density | Use Case |
+|-----------------|-------------|----------|
+| Linear | Normal | Any response; leads to OLS |
+| Logistic | Bernoulli | Binary or fractional response |
+| Logistic | Binomial | Nonnegative response with known upper bound |
+| Logistic | Multinomial | Multinomial or multiple fractional response |
+| Exponential | Poisson | Nonnegative response (count, corner), no natural upper bound |
+
+**Reference implementation(s):**
+- Stata: `jwdid` package (Rios-Avila, 2021)
+- R: `etwfe` package (McDermott, 2023)
+- Simulations in paper performed in Stata 17
+
+**Requirements checklist:**
+- [ ] Pooled QMLE estimation with full saturated design matrix (Eq 3.12)
+- [ ] Three estimation methods: OLS (identity link), logit (Bernoulli QLL), Poisson (Poisson QLL)
+- [ ] ASF-based ATT computation for nonlinear models (Eq 3.15)
+- [ ] Imputation-based ATT for comparison/equivalence check (Eq 3.10-3.11)
+- [ ] Equivalence between imputation and pooled when using canonical link (Proposition 3.1)
+- [ ] Cohort×time interaction design matrix with centred covariates
+- [ ] Unit-level cluster-robust SEs
+- [ ] Delta-method SEs for aggregated ATTs
+- [ ] Panel bootstrap (unit resampling) as alternative inference
+- [ ] Four aggregation types: simple (static), dynamic (event study), calendar, group
+- [ ] Pre-treatment testing: event-study and heterogeneous trends approaches (Section 4)
+- [ ] Support for both never-treated and all-eventually-treated designs (Section 7.1)
+- [ ] Treatment exit support (Section 7.2)
+- [ ] Time-varying covariates (Section 7.3)
+- [ ] Index-scale effects `δ_{gr}` reported alongside level-scale ATTs `τ_{gr}` (Eq 3.16)
+
+---
+
+## Implementation Notes
+
+### Data Structure Requirements
+- Balanced or unbalanced panel: N units observed over T fixed time periods
+- Required: outcome `Y_{it}`, unit ID, time period, cohort indicator `D_g` (or `first_treat`), time-varying treatment `W_{it}`
+- Optional: time-constant covariates `X_i`
+- `W_{it}` is absorbing: once treated, always treated (no exit unless using Section 7.2 extension)
+- Cohorts defined by first treatment period `g ∈ {q, q+1, ..., T, ∞}`
+
+### Computational Considerations
+- Pooled estimation is a single regression over all N×T observations — O(N·T·K²) where K is number of parameters
+- K grows as (number of cohorts) × (number of post-treatment periods) for the interaction terms
+- For nonlinear models, IRLS convergence typically fast (< 25 iterations for logit/Poisson)
+- ASF computation requires averaging over all units in each cohort — O(N_g) per (g,r) cell
+- Delta-method gradient for nonlinear ATTs requires `G'(η̂_i + δ̂_{gr})` and `G'(η̂_i)` per unit
+- Parallelization: bootstrap iterations are embarrassingly parallel; ASF computation per cell is independent
+
+### Tuning Parameters
+
+| Parameter | Type | Default | Selection Method |
+|-----------|------|---------|-----------------|
+| `G(·)` | function | identity (OLS) | Dictated by outcome type: logistic for binary/fractional, exponential for counts/nonneg |
+| control_group | str | "never_treated" | Use "not_yet_treated" when no pure never-treated group |
+| anticipation | int | 0 | Domain knowledge; test with pre-treatment indicators |
+| covariates | list | None | Pre-intervention, time-constant variables only (Section 7.3 for time-varying) |
+
+### Relation to Existing diff-diff Estimators
+
+- **CallawaySantAnna:** OLS ETWFE ATT(g,t) estimates are equivalent to CS ATT(g,t) under linear PT with never-treated control (proven in Wooldridge 2021). However: CS uses only the period just prior to treatment as comparison, while ETWFE uses all pre-treatment periods — stronger assumption but more efficient. Key difference: ETWFE extends naturally to nonlinear outcomes; CS does not.
+- **ImputationDiD (BJS):** In the linear case, the imputation procedure (Procedure 1) is numerically identical to BJS imputation (Section 3.2, Proposition 3.1). With canonical link in LEF, pooled and imputation are equivalent.
+- **TwoStageDiD (Gardner):** Similar two-step logic (estimate FE on controls, impute counterfactuals). ETWFE pooled approach is a single-step alternative.
+- **SunAbraham:** Both produce interaction-weighted estimates. SA uses TWFE with saturated interactions; ETWFE generalises to nonlinear.
+- **EfficientDiD:** Chen, Sant'Anna & Xie (2025) achieve the semiparametric efficiency bound. ETWFE does not claim efficiency — but the pooled QMLE is typically more efficient than CS in simulations (SEs 45-69% smaller than CS for logit; 31-57% smaller for Poisson — Section 5).
+- **Existing `solve_logit` in `linalg.py`:** Currently used only for propensity scores in CallawaySantAnna. Can be reused for logit outcome estimation. A new `solve_poisson` IRLS solver would be needed for the Poisson path.
+- **`within_transform` in `utils.py`:** Can be used for the OLS path to absorb unit+time FE. NOT suitable for nonlinear paths — logit/Poisson require explicit cohort+time dummy columns (no within-transformation for nonlinear models due to incidental parameters).
+
+---
+
+## Simulation Findings (Section 5)
+
+### Binary outcomes with common timing (Section 5.1)
+- N = 500, T = 6, q = 4 (3 pre-treatment periods), 5000 MC replications
+- **Logit PMLE essentially unbiased** for all three SATTs regardless of trend strength
+- **POLS (linear) shows large downward bias** with strong aggregate trends (biases -0.15 to -0.29 vs true SATTs of 0.06-0.16)
+- **CS (2021) also shows nontrivial negative bias** in strong-trend scenarios
+- POLS SEs at least 45% larger than logit PMLE SEs; CS SEs at least 69% larger
+- PT tests (event-study and heterogeneous trends) fail to detect misspecification of the linear model — rejection rates around 5% even when LPT is violated
+
+### Nonnegative outcomes with staggered intervention (Section 5.2)
+- Poisson QMLE is clear winner when exponential mean is correct
+- **Poisson QMLE essentially unbiased** for all six SATTs
+- **Pooled OLS shows downward biases over 30%** in all cases, over 50% in some
+- **CS has less bias than POLS** but still very different from true SATTs
+- Poisson QMLE precision notably better than both alternatives (SDs 31-57% smaller than CS)
+- Event-study PT test rejects linear model 99.5% of the time; heterogeneous trends test rejects 99.9%
+- Corner solution case (`P(Y = 0) ≈ 0.37`): Poisson QMLE still essentially unbiased despite mass at zero
+
+### Key takeaway
+The simulations demonstrate that **when the outcome is binary or count-valued, using the correct nonlinear link function produces dramatically better estimates** than linear methods. The improvement is both in bias (nonlinear: ~0; linear: up to 50% bias) and precision (nonlinear SEs 30-70% smaller).
+
+---
+
+## Gaps and Uncertainties
+
+1. **Fractional logit implementation details.** The paper mentions fractional responses (Papke & Wooldridge 1996, 2008) and the Bernoulli QLL with known upper bound `B_{it}`, but does not provide explicit formulas for the bounded case. Implementation should follow standard fractional logit/probit references.
+
+2. **Aggregation weight formulas not fully explicit.** Section 3.1 (p. C50) describes aggregation conceptually ("weighted averages... where the weights are the proportions of the treated cohorts") but does not provide explicit weight formulas with equation numbers. The `jwdid_estat` Stata command documentation should be consulted for exact weight definitions.
+
+3. **Delta-method gradient for nonlinear ATTs.** The paper states "by applying the delta method with averaging" (p. C47, citing Wooldridge 2010 problem 12.17) but does not write out the explicit gradient. For implementation:
+   - For `τ̃_{gr}` from Eq 3.15: `∂τ̃_{gr}/∂δ_{gr} = N_g⁻¹ Σ_i D_{ig} G'(η̂_i + δ̂_{gr})`
+   - For other parameters `θ_k`: `∂τ̃_{gr}/∂θ_k = N_g⁻¹ Σ_i D_{ig} [G'(η̂_i + δ̂_{gr}) · ∂η̂_i/∂θ_k - G'(η̂_i) · ∂η̂_i/∂θ_k]`
+   - Verify these against numerical gradients during implementation.
+
+4. **Covariate centring.** The paper centres covariates at cohort means `X̂_{ig} = X_i - X̄_g` where `X̄_g = E(X_i|D_{ig} = 1)` (p. C48). In practice, use sample cohort means. The centring affects interpretation of `δ_{gs}` (makes it the ATT on the log-odds/log scale) but does not affect `τ_{gr}` estimates.
+
+5. **Poisson FE vs pooled QMLE.** Section 3.3 (p. C51-C52) notes that FE Poisson (with unit dummies) does NOT suffer from incidental parameters — a unique property of the exponential mean. The paper notes that without covariates, FE Poisson and pooled QMLE give identical `γ_s` and `δ_{gs}` estimates, but with covariates they differ. The pooled approach is recommended for simplicity.
+
+6. **Online Appendix.** The paper references an Online Appendix with detailed simulation results (p. C56: "From the findings reported in the Online Appendix..."). This appendix is not included in the main PDF and may contain additional implementation-relevant details.

From 948ee9a169c37712f5af594180e5c173c45eb474 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 10:03:02 -0400
Subject: [PATCH 2/8] Address PR #443 R1 review (4 P2, 2 P3, docs accuracy)

athey-2025-review.md:
- Replace contributor-local absolute PDF path with arXiv URL
- Note shipped TROP requires absorbing treatment (paper Eq 13 generalization
  is out of scope for the current implementation)
- Rename "twostep"/"joint" to "local"/"global" and correct the global-method
  description to residual-based treated-cell effects averaged into ATT

wooldridge-2023-review.md:
- Split delta_2 interpretation by link function (exponential = log diff /
  proportional effect; logistic = change in log-odds)
- Update control_group default to "not_yet_treated" (matches wooldridge.py:305)
- Update implementation note: solve_poisson exists at linalg.py:3431 and is
  used in the Poisson path
- Add aggregation deviation note linking to REGISTRY and TODO entries

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md      | 7 ++++---
 docs/methodology/papers/wooldridge-2023-review.md | 9 ++++++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
index 767ff161..e7147284 100644
--- a/docs/methodology/papers/athey-2025-review.md
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -2,7 +2,7 @@
 
 **Authors:** Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano
 **Citation:** Athey, S., Imbens, G.W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. *arXiv preprint arXiv:2508.21536v2*.
-**PDF reviewed:** /Users/igerber/diff-diff/papers/2508.21536v2.pdf
+**PDF reviewed:** https://arxiv.org/abs/2508.21536 (v2)
 **Review date:** 2026-02-08
 
 ---
@@ -277,7 +277,8 @@ Note: Stratified bootstrap -- control and treated units resampled separately. Pr
 - **Treatment matrix**: W (N x T), binary treatment assignments where `W_it in {0, 1}`
 - **Covariates** (optional): X_it, observed covariates for each unit-period pair
 - Treatment must be an absorbing state for standard block assignment (W_it = 1{i > N_0} * 1{t > T_0})
-- General assignment patterns (treatment switching on/off) supported via Equation 13
+- **Paper scope (Equation 13):** the paper extends TROP to general assignment patterns including treatment switching on/off.
+- **Shipped implementation:** the current `diff_diff/trop.py` requires an absorbing-state treatment indicator and rejects non-absorbing/event-style inputs (gate in `diff_diff/trop.py:505-525`, also documented in `docs/methodology/REGISTRY.md` under TROP). Generalization to non-absorbing patterns is not in scope for the current implementation.
 
 ### Computational Considerations
 - **Main bottleneck**: LOOCV grid search -- for each grid point, every control observation requires a separate nuclear-norm penalized weighted least squares solve
@@ -310,7 +311,7 @@ Empirical guidance from Table 2 (cross-validated values, T_post=10, N_tr=10):
 Key finding (Section 4.3, pages 18-19): Substantial heterogeneity across applications in optimal tuning parameters. Removing regression adjustment (lambda_nn = infinity) or time weights (lambda_time = 0) increases RMSE significantly (up to 85% and 91% respectively). Removing unit weights (lambda_unit = 0) increases RMSE only modestly (max 10%).
 
 ### Relation to Existing diff-diff Estimators
-- **Direct implementation**: `diff_diff/trop.py` implements the TROP estimator with both "twostep" (per-observation, Algorithm 2) and "joint" (WLS, homogeneous effect) methods.
+- **Direct implementation**: `diff_diff/trop.py` implements the TROP estimator with two public methods (`diff_diff/trop.py:64-78`, validated at `diff_diff/trop.py:909`): `method="local"` is the per-treated-cell estimator (Algorithm 2), and `method="global"` fits a single weighted model on control observations and averages residual-based treated-cell effects into the ATT (`diff_diff/trop_global.py:554-585`).
 - **Existing Rust backend**: `rust/src/trop.rs` provides accelerated distance matrix computation, LOOCV grid search, and bootstrap variance estimation.
 - **Encompasses SyntheticDiD**: TROP with `lambda_nn = infinity` and specific weight choices reduces to SDID (`diff_diff/synthetic_did.py`).
 - **Encompasses TWFE**: TROP with `lambda_nn = infinity` and uniform weights reduces to DID/TWFE (`diff_diff/twfe.py`).
diff --git a/docs/methodology/papers/wooldridge-2023-review.md b/docs/methodology/papers/wooldridge-2023-review.md
index ddc1f866..5eef0f2f 100644
--- a/docs/methodology/papers/wooldridge-2023-review.md
+++ b/docs/methodology/papers/wooldridge-2023-review.md
@@ -51,7 +51,9 @@ where `Ȳ_{dt}` is the sample average for group `d` in period `t`. Special cases
     δ_2 ≡ G⁻¹(E[Y_2(1)|D=1]) - G⁻¹(E[Y_2(0)|D=1])
         = G⁻¹(E[Y_2(1)|D=1]) - (α + β + γ_2)
 
-When `G(z) = exp(z)`, `δ_2` is the log-odds ratio (logit) or log rate ratio (Poisson).
+Interpretation depends on the link function:
+- Exponential mean (`G(z) = exp(z)`, Poisson): `δ_2` is a log difference, i.e., a proportional/log rate effect on `E[Y|...]`.
+- Logistic mean (`G(z) = Λ(z) = 1/(1 + exp(-z))`, logit): `δ_2` is a change in log-odds of `E[Y|...]`.
 
 *Imputation estimator — staggered with covariates (Procedure 1, Eq 3.10-3.11):*
 
@@ -106,6 +108,7 @@ When all units are treated at the same time `g`, the model simplifies to:
 - **Calendar time:** weighted average across cohorts for each calendar period
 - **Group:** weighted average across periods for each cohort
 - SEs for all aggregations via delta method or panel bootstrap
+- **Note (current implementation deviation):** the shipped `WooldridgeDiD` aggregations use cell-level observation-count weights `n_{g,t}` (matching Stata `jwdid_estat`) rather than the paper's cohort-share weights from Eqs. 7.2-7.4. See `docs/methodology/REGISTRY.md` "Aggregations" under WooldridgeDiD and the corresponding line in `TODO.md` ("Tech Debt from Code Reviews") for the tracked deviation.
 
 *Testing parallel trends (Section 4):*
 
@@ -190,7 +193,7 @@ Two approaches:
 | Parameter | Type | Default | Selection Method |
 |-----------|------|---------|-----------------|
 | `G(·)` | function | identity (OLS) | Dictated by outcome type: logistic for binary/fractional, exponential for counts/nonneg |
-| control_group | str | "never_treated" | Use "not_yet_treated" when no pure never-treated group |
+| control_group | str | "not_yet_treated" (matches `diff_diff/wooldridge.py:305`) | Use "never_treated" only when a pure never-treated group is available and required |
 | anticipation | int | 0 | Domain knowledge; test with pre-treatment indicators |
 | covariates | list | None | Pre-intervention, time-constant variables only (Section 7.3 for time-varying) |
 
@@ -201,7 +204,7 @@ Two approaches:
 - **TwoStageDiD (Gardner):** Similar two-step logic (estimate FE on controls, impute counterfactuals). ETWFE pooled approach is a single-step alternative.
 - **SunAbraham:** Both produce interaction-weighted estimates. SA uses TWFE with saturated interactions; ETWFE generalises to nonlinear.
 - **EfficientDiD:** Chen, Sant'Anna & Xie (2025) achieve the semiparametric efficiency bound. ETWFE does not claim efficiency — but the pooled QMLE is typically more efficient than CS in simulations (SEs 45-69% smaller than CS for logit; 31-57% smaller for Poisson — Section 5).
-- **Existing `solve_logit` in `linalg.py`:** Currently used only for propensity scores in CallawaySantAnna. Can be reused for logit outcome estimation. A new `solve_poisson` IRLS solver would be needed for the Poisson path.
+- **Solvers in `linalg.py`:** `solve_logit` is reused for the logit outcome path; `solve_poisson` (`diff_diff/linalg.py:3431`) is the IRLS solver used by the Poisson path (`diff_diff/wooldridge.py:1085-1124`).
 - **`within_transform` in `utils.py`:** Can be used for the OLS path to absorb unit+time FE. NOT suitable for nonlinear paths — logit/Poisson require explicit cohort+time dummy columns (no within-transformation for nonlinear models due to incidental parameters).
 
 ---

From b527f78eac40ff95f0a52710b1c012b357e29621 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 10:24:13 -0400
Subject: [PATCH 3/8] Address PR #443 R2 review (2 P2, 1 P3, docs accuracy)

athey-2025-review.md:
- Correct "7 real datasets" to "6 real datasets / 7 simulation applications"
  (CPS is used for both logwage and urate outcomes; paper Table 1 / Section 3
  names 6 source datasets)
- Rewrite Equation 13 nuclear-norm gap note as a neutral source-based check
  (remove authoring artifact)

wooldridge-2023-review.md:
- Surface shipped covariate API (exovar / xtvar / xgvar incl. time-varying
  via xtvar with demean_covariates default) in Data Structure Requirements
  and Tuning Parameters table; cross-link to wooldridge.py:394-411 and
  REGISTRY.md "Covariates"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md      | 6 +++---
 docs/methodology/papers/wooldridge-2023-review.md | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
index e7147284..0b2e53f7 100644
--- a/docs/methodology/papers/athey-2025-review.md
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -322,8 +322,8 @@ Key finding (Section 4.3, pages 18-19): Substantial heterogeneity across applica
 
 ## Simulation Design Details
 
-The paper uses semi-synthetic simulations (Section 3.1, pages 9-11) based on 7 real datasets:
-1. CPS (Current Population Survey): N=50, T=40
+The paper uses semi-synthetic simulations (Section 3.1, pages 9-11) based on 6 real datasets (7 simulation applications, since CPS is used for two outcomes: logwage and urate):
+1. CPS (Current Population Survey): N=50, T=40 — used for both CPS logwage and CPS urate applications
 2. PWT (Penn World Table): N=111, T=48
 3. Germany reunification: N=17, T=44
 4. Basque country: N=18, T=43
@@ -350,7 +350,7 @@ The paper uses semi-synthetic simulations (Section 3.1, pages 9-11) based on 7 r
 
 5. **Weight normalization**: Section 5 (page 20) states weights sum to one (`1^T omega = 1^T theta = 1`), but the weight specification in Equation 3 (page 7) uses unnormalized exponential weights. It is unclear whether normalization is applied before or after the optimization, or whether the theoretical results in Section 5 assume normalized weights while the practical algorithm uses unnormalized weights. The existing diff-diff implementation uses unnormalized weights in the optimization (matching Equation 2).
 
-6. **Nuclear norm penalty in Equation 13**: Agent 2's extraction of Equation 13 writes `lambda_nn ||L||_*^2` (squared nuclear norm) while Agent 1 and the Equation 2 formulation write `lambda_nn ||L||_*` (unsquared). The paper likely uses the unsquared nuclear norm consistently (matching Equation 2), but this should be verified against pages 26-27 of the PDF.
+6. **Nuclear norm penalty in Equation 13**: Equation 13's penalty term should be rechecked against pages 26-27 of the PDF to confirm whether it uses the unsquared nuclear norm `lambda_nn ||L||_*` (matching Equation 2) or a squared variant `lambda_nn ||L||_*^2`. The shipped implementation uses the unsquared form per Equation 2.
 
 7. **General assignment patterns**: Section 6.1 generalizes beyond block assignment, but the inference results (Section 5.3) and theoretical guarantees (Theorem 5.1) are stated under block assignment. The extent to which theoretical results carry over to general patterns is not fully characterized.
 
diff --git a/docs/methodology/papers/wooldridge-2023-review.md b/docs/methodology/papers/wooldridge-2023-review.md
index 5eef0f2f..8c59b383 100644
--- a/docs/methodology/papers/wooldridge-2023-review.md
+++ b/docs/methodology/papers/wooldridge-2023-review.md
@@ -176,7 +176,7 @@ Two approaches:
 ### Data Structure Requirements
 - Balanced or unbalanced panel: N units observed over T fixed time periods
 - Required: outcome `Y_{it}`, unit ID, time period, cohort indicator `D_g` (or `first_treat`), time-varying treatment `W_{it}`
-- Optional: time-constant covariates `X_i`
+- Optional covariates (paper notation `X_i` is time-constant; shipped `diff_diff/wooldridge.py:394-411` supports a richer set): `exovar` (time-invariant, no interaction or demeaning), `xtvar` (time-varying, demeaned within cohort×period cells when `demean_covariates=True`), `xgvar` (covariates interacted with each cohort indicator). See `docs/methodology/REGISTRY.md` under WooldridgeDiD "Covariates".
 - `W_{it}` is absorbing: once treated, always treated (no exit unless using Section 7.2 extension)
 - Cohorts defined by first treatment period `g ∈ {q, q+1, ..., T, ∞}`
 
@@ -195,7 +195,7 @@ Two approaches:
 | `G(·)` | function | identity (OLS) | Dictated by outcome type: logistic for binary/fractional, exponential for counts/nonneg |
 | control_group | str | "not_yet_treated" (matches `diff_diff/wooldridge.py:305`) | Use "never_treated" only when a pure never-treated group is available and required |
 | anticipation | int | 0 | Domain knowledge; test with pre-treatment indicators |
-| covariates | list | None | Pre-intervention, time-constant variables only (Section 7.3 for time-varying) |
+| exovar / xtvar / xgvar | list / list / list | None / None / None | `exovar` for time-invariant, `xtvar` for time-varying (Section 7.3; demeaned via `demean_covariates=True`, default), `xgvar` for cohort-interacted. See `diff_diff/wooldridge.py:394-411`. |
 
 ### Relation to Existing diff-diff Estimators
 

From c92c2dcf439916c13b3437dc8fff70873ebeec03 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 10:33:18 -0400
Subject: [PATCH 4/8] Address PR #443 R3 review (2 P2, docs accuracy)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

wooldridge-2023-review.md:
- Aggregation note: stop attributing "Eqs. 7.2-7.4" to the 2023 paper
  (the 2023 paper describes aggregation only conceptually in Section 3.1;
  the formal cohort-share equations are from W2025 per REGISTRY.md)
- Implementation Notes: separate paper notation from shipped API. Users
  provide cohort/first_treat; W_it is constructed internally from cohort+
  time via _build_interaction_matrix (wooldridge.py:165-189), not passed
  as a column
- Standard errors: add shipped-API restriction note — n_bootstrap > 0 is
  OLS-only (wooldridge.py:432-437) and rejected with survey_design
  (wooldridge.py:441-444)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../papers/wooldridge-2023-review.md          | 21 ++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/docs/methodology/papers/wooldridge-2023-review.md b/docs/methodology/papers/wooldridge-2023-review.md
index 8c59b383..4e24ea62 100644
--- a/docs/methodology/papers/wooldridge-2023-review.md
+++ b/docs/methodology/papers/wooldridge-2023-review.md
@@ -98,9 +98,10 @@ When all units are treated at the same time `g`, the model simplifies to:
 
 *Standard errors (Section 3, p. C47):*
 - Delta method for individual `τ̂_{gr}` and aggregated ATTs (Wooldridge 2010, problem 12.17)
-- Panel bootstrap (resampling units) is also valid — allows for serial dependence and model misspecification
+- Panel bootstrap (resampling units) is also valid in the paper's framework — allows for serial dependence and model misspecification
 - Cluster-robust SEs at unit level for pooled estimation
 - For nonlinear models: standard sandwich `(X'WX)⁻¹ meat (X'WX)⁻¹` where `W = diag(μ_i(1-μ_i))` for logit or `W = diag(μ_i)` for Poisson
+- **Note (shipped API restriction):** in `WooldridgeDiD`, `n_bootstrap > 0` is currently OLS-only (rejected for logit/Poisson at `diff_diff/wooldridge.py:432-437`) and rejected when `survey_design` is set (`diff_diff/wooldridge.py:441-444`). Use analytical SEs for nonlinear or survey paths.
 
 *Aggregation (Section 3.1, p. C50):*
 - **Simple (static):** weighted average of immediate effects `τ̃_{sg}` across cohorts `g = q,...,T`, weights = cohort proportions among all eventually treated
@@ -108,7 +109,7 @@ When all units are treated at the same time `g`, the model simplifies to:
 - **Calendar time:** weighted average across cohorts for each calendar period
 - **Group:** weighted average across periods for each cohort
 - SEs for all aggregations via delta method or panel bootstrap
-- **Note (current implementation deviation):** the shipped `WooldridgeDiD` aggregations use cell-level observation-count weights `n_{g,t}` (matching Stata `jwdid_estat`) rather than the paper's cohort-share weights from Eqs. 7.2-7.4. See `docs/methodology/REGISTRY.md` "Aggregations" under WooldridgeDiD and the corresponding line in `TODO.md` ("Tech Debt from Code Reviews") for the tracked deviation.
+- **Note (current implementation deviation):** the shipped `WooldridgeDiD` aggregations use cell-level observation-count weights `n_{g,t}` (matching Stata `jwdid_estat`) rather than the cohort-share weights described conceptually in Section 3.1. The 2023 paper does not provide explicit aggregation-weight equations; the formal cohort-share equations referenced in `docs/methodology/REGISTRY.md` ("W2025 Eqs. 7.2-7.4") are from a later Wooldridge ETWFE source. See `docs/methodology/REGISTRY.md` "Aggregations" under WooldridgeDiD and the corresponding line in `TODO.md` ("Tech Debt from Code Reviews") for the tracked deviation.
 
 *Testing parallel trends (Section 4):*
 
@@ -174,11 +175,17 @@ Two approaches:
 ## Implementation Notes
 
 ### Data Structure Requirements
-- Balanced or unbalanced panel: N units observed over T fixed time periods
-- Required: outcome `Y_{it}`, unit ID, time period, cohort indicator `D_g` (or `first_treat`), time-varying treatment `W_{it}`
-- Optional covariates (paper notation `X_i` is time-constant; shipped `diff_diff/wooldridge.py:394-411` supports a richer set): `exovar` (time-invariant, no interaction or demeaning), `xtvar` (time-varying, demeaned within cohort×period cells when `demean_covariates=True`), `xgvar` (covariates interacted with each cohort indicator). See `docs/methodology/REGISTRY.md` under WooldridgeDiD "Covariates".
-- `W_{it}` is absorbing: once treated, always treated (no exit unless using Section 7.2 extension)
-- Cohorts defined by first treatment period `g ∈ {q, q+1, ..., T, ∞}`
+
+*Paper notation:* `Y_{it}` (outcome), `D_g` (cohort indicator), `W_{it}` (time-varying treatment), `X_i` (time-invariant covariates).
+
+*Shipped API (`diff_diff/wooldridge.py:394-411`):* users provide outcome, unit ID, time, and `cohort` (or `first_treat`). The model derives `W_{it}` internally from `cohort` and `time` via `_build_interaction_matrix` (`diff_diff/wooldridge.py:165-189`) — users do NOT pass `W_{it}` as a column.
+
+*Covariates (richer than paper notation):* `exovar` (time-invariant, no interaction or demeaning), `xtvar` (time-varying, demeaned within cohort×period cells when `demean_covariates=True`), `xgvar` (covariates interacted with each cohort indicator). See `docs/methodology/REGISTRY.md` under WooldridgeDiD "Covariates".
+
+*Other contracts:*
+- Balanced or unbalanced panel: N units observed over T fixed time periods.
+- Treatment is absorbing: once treated, always treated (no exit unless using Section 7.2 extension).
+- Cohorts defined by first treatment period `g ∈ {q, q+1, ..., T, ∞}`.
 
 ### Computational Considerations
 - Pooled estimation is a single regression over all N×T observations — O(N·T·K²) where K is number of parameters

From 6dbddf7d55fe61fdeced432145b9d5ba66c8d90f Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 10:43:32 -0400
Subject: [PATCH 5/8] Address PR #443 R4 review (2 P2, paper-vs-shipped
 clarity)

athey-2025-review.md:
- Relabel balanced-panel bullet as "Paper assumption" and add an adjacent
  shipped-implementation note that diff_diff/trop.py supports unbalanced
  panels with structural gaps (matches the later "Unbalanced panels" entry
  under Gaps and Uncertainties and the corresponding REGISTRY section)

wooldridge-2023-review.md:
- Treatment exit (Section 7.2) bullet: mark as extension and carry over the
  paper's additional restriction that future shocks to untreated potential
  outcomes cannot drive exit
- Multiple treatment levels (Section 7.4) bullet: mark as extension; note
  the paper describes it as relatively straightforward but not fully
  general, leaving the precise multi-level estimand to future work

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md      | 3 ++-
 docs/methodology/papers/wooldridge-2023-review.md | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
index 0b2e53f7..23d0122f 100644
--- a/docs/methodology/papers/athey-2025-review.md
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -272,7 +272,8 @@ Note: Stratified bootstrap -- control and treated units resampled separately. Pr
 ## Implementation Notes
 
 ### Data Structure Requirements
-- **Balanced panel data**: N units x T time periods
+- **Paper assumption — balanced panel:** N units x T time periods.
+- **Shipped implementation:** `diff_diff/trop.py` accepts unbalanced panels with structural gaps (see the "Unbalanced panels" item under Gaps and Uncertainties below and the corresponding section of `docs/methodology/REGISTRY.md` under TROP).
 - **Outcome matrix**: Y (N x T), observed outcomes
 - **Treatment matrix**: W (N x T), binary treatment assignments where `W_it in {0, 1}`
 - **Covariates** (optional): X_it, observed covariates for each unit-period pair
diff --git a/docs/methodology/papers/wooldridge-2023-review.md b/docs/methodology/papers/wooldridge-2023-review.md
index 4e24ea62..726eb1c1 100644
--- a/docs/methodology/papers/wooldridge-2023-review.md
+++ b/docs/methodology/papers/wooldridge-2023-review.md
@@ -123,9 +123,9 @@ Two approaches:
 - **Count/nonnegative outcomes:** Use exponential `G(z) = exp(z)` with Poisson QLL. Does not require Poisson distribution — only exponential conditional mean.
 - **Corner solutions (`Y ≥ 0` with mass at zero):** Poisson QMLE still valid. Exponential mean handles zeros naturally (p. C56-C57).
 - **All units eventually treated (Section 7.1):** Methods work with minor modification. Last-treated cohort (`g = T`) serves as control. ATTs defined relative to `Y_t(T)` rather than `Y_t(∞)`.
-- **Treatment exit (Section 7.2):** Expand cohort definition to `D_{gh}` where `g` = start period, `h` = end period. ATTs `τ_{ghr}` defined for `r = g,...,T`.
+- **Treatment exit (Section 7.2, extension):** Expand cohort definition to `D_{gh}` where `g` = start period, `h` = end period. ATTs `τ_{ghr}` defined for `r = g,...,T`. The paper notes this extension requires the additional restriction that future shocks to untreated potential outcomes do not drive exit; absent that condition, exit timing becomes endogenous and the standard ETWFE identification argument no longer carries over directly.
 - **Time-varying covariates (Section 7.3):** Replace `X_i` with `X_{it}` in Eq 3.12. Only valid if covariates are not influenced by the intervention.
-- **Multiple treatment levels (Section 7.4):** Replace binary `W_{it}` with set of treatment level indicators. Define cohorts by `D_{iga}` (first period `g`, initial level `a`).
+- **Multiple treatment levels (Section 7.4, extension):** Replace binary `W_{it}` with a set of treatment-level indicators; define cohorts by `D_{iga}` (first period `g`, initial level `a`). The paper describes this only as relatively straightforward, not fully general, and leaves the precise multi-level estimand to future work — so the exact ATT object under non-binary treatment is not pinned down in Wooldridge (2023).
 - **Incidental parameters:** For logit, unit-specific dummies cause incidental parameters problem. Pooled QMLE with cohort dummies (not unit dummies) avoids this. For Poisson with exponential mean, FE Poisson estimator does NOT suffer from incidental parameters (Wooldridge 1999) — this is a unique exception.
 - **No never-treated group:** When `g = T` is the last cohort, it serves as control. Cannot estimate ATT for this last cohort.
 

From 796decd1e7dc1dd039b55625410777c1932de447 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 10:50:28 -0400
Subject: [PATCH 6/8] Address PR #443 R5 review (1 P2, TROP source-ambiguity
 consistency)

athey-2025-review.md:
- Reframe Methodology Registry Entry intro from "copy into REGISTRY"
  ready-to-promote to a working-draft framing that explicitly defers
  promotion until two source-ambiguity items (weight normalization,
  Eq. 13 penalty form) are resolved against the source
- Pull the weight-normalization line out of the Requirements Checklist
  (it was framed as a settled requirement); restate it as an open
  source-ambiguity cross-referencing Gap #5, with the current shipped
  implementation pinned to the Equation 2 (unnormalized) interpretation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
index 23d0122f..05ad99b6 100644
--- a/docs/methodology/papers/athey-2025-review.md
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -9,7 +9,7 @@
 
 ## Methodology Registry Entry
 
-*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries -- copy the `## TROP` section into the appropriate category in the registry.*
+*Working draft formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries. Two open source-ambiguity items remain (weight normalization and Eq. 13 penalty form — see "Gaps and Uncertainties" below); resolve those against the source before promoting this section into REGISTRY.md.*
 
 ## TROP
 
@@ -264,7 +264,7 @@ Note: Stratified bootstrap -- control and treated units resampled separately. Pr
 - [ ] Stratified bootstrap preserving unit-level structure (Algorithm 3)
 - [ ] Covariate extension supported (Equation 14)
 - [ ] Special cases recoverable: DID, MC, SC, SDID via tuning parameters
-- [ ] Weight normalization: `1^T omega = 1^T theta = 1` (Section 5, page 20)
+- Weight normalization (`1^T omega = 1^T theta = 1`, Section 5, page 20) — **open source-ambiguity item**; see Gap #5 below. Section 5 states sum-to-one, Equation 3 / Equation 2 use unnormalized exponential weights, and the shipped implementation matches Equation 2. Do not promote this to a requirement until the discrepancy is resolved against the source.
 - [ ] Heterogeneous treatment effects supported via per-observation estimation (Remark 6.1)
 
 ---

From b14b7d01ac598c80a1f5961d387c93692c590ffb Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 11:01:10 -0400
Subject: [PATCH 7/8] Address PR #443 R6 review (2 P3 actionable, docs cleanup)

athey-2025-review.md:
- Mark Equation 13 nuclear-norm penalty as resolved (paper text confirms
  the same unsquared form as Equation 2)
- Tighten draft-framing intro: weight normalization is the only remaining
  open source-ambiguity item

wooldridge-2023-review.md:
- Replace nonexistent local PDF path with the Econometrics Journal DOI
  URL so the provenance trail is reproducible from the repo

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md      | 4 ++--
 docs/methodology/papers/wooldridge-2023-review.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
index 05ad99b6..a9d946b4 100644
--- a/docs/methodology/papers/athey-2025-review.md
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -9,7 +9,7 @@
 
 ## Methodology Registry Entry
 
-*Working draft formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries. Two open source-ambiguity items remain (weight normalization and Eq. 13 penalty form — see "Gaps and Uncertainties" below); resolve those against the source before promoting this section into REGISTRY.md.*
+*Working draft formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries. One open source-ambiguity item remains (weight normalization — see Gap #5 under "Gaps and Uncertainties" below); resolve it against the source before promoting this section into REGISTRY.md.*
 
 ## TROP
 
@@ -351,7 +351,7 @@ The paper uses semi-synthetic simulations (Section 3.1, pages 9-11) based on 6 r
 
 5. **Weight normalization**: Section 5 (page 20) states weights sum to one (`1^T omega = 1^T theta = 1`), but the weight specification in Equation 3 (page 7) uses unnormalized exponential weights. It is unclear whether normalization is applied before or after the optimization, or whether the theoretical results in Section 5 assume normalized weights while the practical algorithm uses unnormalized weights. The existing diff-diff implementation uses unnormalized weights in the optimization (matching Equation 2).
 
-6. **Nuclear norm penalty in Equation 13**: Equation 13's penalty term should be rechecked against pages 26-27 of the PDF to confirm whether it uses the unsquared nuclear norm `lambda_nn ||L||_*` (matching Equation 2) or a squared variant `lambda_nn ||L||_*^2`. The shipped implementation uses the unsquared form per Equation 2.
+6. **Nuclear norm penalty in Equation 13** (resolved): the source uses the same unsquared nuclear-norm penalty `lambda_nn ||L||_*` in Equation 13 as in Equation 2 (consistent with the rest of the draft and confirmed against the paper text). The shipped implementation matches.
 
 7. **General assignment patterns**: Section 6.1 generalizes beyond block assignment, but the inference results (Section 5.3) and theoretical guarantees (Theorem 5.1) are stated under block assignment. The extent to which theoretical results carry over to general patterns is not fully characterized.
 
diff --git a/docs/methodology/papers/wooldridge-2023-review.md b/docs/methodology/papers/wooldridge-2023-review.md
index 726eb1c1..3a99bffe 100644
--- a/docs/methodology/papers/wooldridge-2023-review.md
+++ b/docs/methodology/papers/wooldridge-2023-review.md
@@ -2,7 +2,7 @@
 
 **Authors:** Jeffrey M. Wooldridge
 **Citation:** Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. *The Econometrics Journal*, 26(3), C31–C66. https://doi.org/10.1093/ectj/utad016
-**PDF reviewed:** papers/wooldridge.pdf
+**PDF reviewed:** https://doi.org/10.1093/ectj/utad016 (Econometrics Journal — official article URL)
 **Review date:** 2026-03-21
 
 ---

From 0fb0d51994fca942bbc2c1b6bd2259acbd439421 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 15 May 2026 11:06:37 -0400
Subject: [PATCH 8/8] Address PR #443 R7 review (1 P3 actionable, provenance
 pin)

athey-2025-review.md:
- Replace generic arXiv abstract URL with a version-pinned v2 link so the
  reviewed artifact resolves to v2 (the current arXiv record now resolves
  to v3)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/athey-2025-review.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md
index a9d946b4..6ca71516 100644
--- a/docs/methodology/papers/athey-2025-review.md
+++ b/docs/methodology/papers/athey-2025-review.md
@@ -2,7 +2,7 @@
 
 **Authors:** Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano
 **Citation:** Athey, S., Imbens, G.W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. *arXiv preprint arXiv:2508.21536v2*.
-**PDF reviewed:** https://arxiv.org/abs/2508.21536 (v2)
+**PDF reviewed:** https://arxiv.org/abs/2508.21536v2 (version-pinned arXiv abstract for v2)
 **Review date:** 2026-02-08
 
 ---