TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog by igerber · Pull Request #447 · igerber/diff-diff

igerber · 2026-05-15T22:42:19Z

Summary

Wave 1 of a multi-wave tech-debt paydown. Docs-only — no estimator, test, or build code touched.

Delete 3 SHIPPED rows that were left marked Done in the Methodology/Correctness table (HAD trends_lin PR docs: HAD ecosystem completion (RTD audit Batch A) #389, HAD yatchew_hr_test(null="mean_independence") post-PR HAD Phase 4: trends_lin (Eq 17/18) + R-package end-to-end parity #392, HAD T22 tutorial PR Add Tutorial 22: Survey-Weighted HAD walkthrough #440).
Fix stray blank line that was splitting the Methodology/Correctness table into two fragments.
Correct the dCDH survey cell-period allocator row's PR-column PR 2 → #408 (the actual dcdh-by-path-survey-design merge per git log).
Refresh stale reference sections:
- Known Limitations line refs for estimators.py (the file grew from 1357 to 1991 lines; predict() is now at :890-911, the wild-bootstrap fallback comment at :1647).
- Large Module Files regenerated from wc -l diff_diff/*.py ≥1000 lines — adds 11 newly-eligible modules including chaisemartin_dhaultfoeuille.py (8636), had_pretests.py (4951), had.py (4593). Header sentence now makes the 3000 / 2000 / 1000 thresholds explicit instead of silently relaxing the documented < 1000 target.
- Standard Error Consistency rewritten — vcov_type has subsumed the proposed se_type knob; readers are pointed at the existing methodology row for the remaining 8-standalone-estimator threading work.
- Test Coverage modernized to reference pytest.importorskip rather than the stale "21 visualization tests" count.
Add new ### Prioritized Tech-Debt Backlog subsection: Tier A/B/C/D ordering by effort × risk, anchoring back to existing source-of-truth tables. Seven Tier A quick wins, twelve Tier B mid-size methodology items, nine Tier C derivation items, eight Tier D deferred/research items.

Net diff: +90 / -36 on a single file.

The new tier structure exists so subsequent waves can be picked off without re-litigating priority each turn. Wave 2 candidate = regenerate benchmarks/data/clubsandwich_cr2_golden.json from R (row 87 / Tier A).

Test plan

Visual sanity: open TODO.md in a markdown previewer and confirm the Methodology/Correctness table renders as a single contiguous table (not two fragments)
grep -c "^| .* | .* | .* | Done |" TODO.md → 0
grep -nE "estimators\.py:778-784|estimators\.py:567-588|\| PR 2 +\|" TODO.md → no matches
grep -nE "\| #408 \|" TODO.md → ≥1 match
grep -nE "^#### Tier [ABCD] —" TODO.md → 4 matches
git diff --name-only main..HEAD → only TODO.md

🤖 Generated with Claude Code

…backlog Wave 1 of multi-wave tech-debt paydown. Docs-only — no estimator code touched. Changes: - Delete 3 SHIPPED rows marked `Done`: HAD trends_lin (PR #389), HAD yatchew_hr_test(null="mean_independence") (post-PR #392), HAD Phase 5 follow-up tutorial T22 (PR #440). - Fix stray blank line inside Methodology/Correctness table (was rendering as two fragments). - Correct dCDH survey cell-period allocator row's PR-column "PR 2" → "#408" (actual dcdh-by-path-survey-design merge). - Refresh Known Limitations line refs to current state of estimators.py (`:778-784` → `:1647`; `:567-588` → `:890-911`). - Regenerate Large Module Files table from `wc -l diff_diff/*.py` ≥1000 lines. Add 11 newly-eligible modules (chaisemartin_dhaultfoeuille.py 8636, had_pretests.py 4951, had.py 4593, etc.). Update header sentence to make the 3000/2000/1000 thresholds explicit instead of silently relaxing the documented "< 1000 lines" target. - Rewrite Standard Error Consistency section — `vcov_type` has subsumed the proposed `se_type` knob; point readers at the existing methodology row for the remaining 8-standalone-estimator threading work. - Modernize Test Coverage section to reference `pytest.importorskip` rather than a stale "21 visualization tests" count. - Add new `### Prioritized Tech-Debt Backlog` subsection: Tier A/B/C/D ordering by effort × risk, anchoring back to existing source-of-truth table rows so the tables remain canonical. Seven Tier A quick wins, twelve Tier B mid-size methodology items, nine Tier C derivation items, eight Tier D deferred/research items. The new tier structure exists so subsequent waves can be picked off without re-litigating priority each turn. Wave 2 candidate is the clubsandwich_cr2_golden.json regeneration from R (row 87 / Tier A). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T22:47:44Z

Overall Assessment

✅ Looks good — no P0/P1 findings. The diff is docs-only, and I did not find any estimator-behavior change, undocumented methodology deviation, or statistical anti-pattern introduced by this PR. I did find two P2 documentation/tech-debt tracking issues in the new TODO.md content.

Executive Summary

No estimator, weighting, inference, or identification code changed, so there is no source-material blocker in the implementation.
P2: the refreshed “Large Module Files” table is incomplete and omits eight current >=1000-line modules.
P2: the rewritten SE-consistency note misstates the current public interface by treating cr1, cr2, and bootstrap as part of vcov_type.
The refreshed predict() reference and the visualization test-skip note appear accurate against the current tree.
No security or performance regressions are introduced by this diff.

Methodology

P2 Impact: The SE-consistency rewrite in TODO.md:L212-L214 does not match the actual variance API. vcov_type currently resolves to {"classical","hc1","hc2","hc2_bm","conley"} in diff_diff/linalg.py:L1202-L1221, hc2 is rejected on clustered fits in diff_diff/linalg.py:L1133-L1141, and bootstrap is configured separately via inference / n_bootstrap in diff_diff/estimators.py:L168-L177. The new text makes it sound like cr1, cr2, and bootstrap are vcov_type labels, which could misdirect future SE-threading work. Concrete fix: rewrite the note to reflect the actual API, e.g. vcov_type ∈ {classical,hc1,hc2,hc2_bm,conley}, with clustered hc1 yielding CR1 behavior and wild bootstrap remaining a separate inference="wild_bootstrap" path.

Code Quality

No findings in changed scope.

Performance

No findings in changed scope.

Maintainability

P2 Impact: The regenerated large-module inventory in TODO.md:L24-L56 is presented as the current state, but it omits eight live diff_diff/*.py files already above the stated >=1000-line threshold: _nprobust_port.py (1412), practitioner.py (1402), trop_global.py (1350), trop_local.py (1339), local_linear.py (1332), wooldridge.py (1305), chaisemartin_dhaultfoeuille_bootstrap.py (1175), and stacked_did.py (1050). That underreports the current maintainability surface and weakens the new tiering section’s claim to be drawing from up-to-date source-of-truth tables. Concrete fix: regenerate the table mechanically from wc -l diff_diff/*.py and include every >=1000 file, or explicitly narrow the section’s scope/title so those omissions are intentional and documented.

Tech Debt

No additional findings. The new Tier A/B/C/D backlog is internally anchored to existing TODO rows.

Security

No findings.

Documentation/Tests

No additional findings. The refreshed predict() limitation reference and the pytest.importorskip visualization-skip note match the current tree.

P2 — SE-consistency rewrite misstated the public API The rewrite listed cr1, cr2, bootstrap as vcov_type values. The actual validated set in linalg.py::_VALID_VCOV_TYPES is {"classical","hc1","hc2","hc2_bm","conley"}; cluster-robust variance is obtained via cluster= alongside the heteroscedasticity kind (hc1+cluster ⇒ CR1, hc2_bm+cluster ⇒ CR2 Bell-McCaffrey), and wild cluster bootstrap is a separate inference="wild_bootstrap" path on the same estimator. Rewrote the SE Consistency paragraph to match the actual API. P2 — Large Module Files table omitted 8 modules already ≥1000 lines The refreshed inventory in section 24-56 missed: _nprobust_port.py (1412), practitioner.py (1402), trop_global.py (1350), trop_local.py (1339), local_linear.py (1332), wooldridge.py (1305), chaisemartin_dhaultfoeuille_bootstrap.py (1175), stacked_did.py (1050). Mechanically regenerated from wc -l diff_diff/*.py >= 1000; all 35 current ≥1000-line modules now listed (verified via comm). Self-audit fix linalg.py Action cell read "Consider splitting (vcov surfaces) — unified backend, splitting would hurt cohesion" — self-contradictory. Reworded to "Consider splitting only if cohesion can be preserved" so the threshold rule and the cohesion constraint can both be honored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T22:53:06Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 4e829a28db1049678abc7c8c5517e4b8cef04925

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed scope.

Executive Summary

Review scope is docs-only: TODO.md; no estimator, weighting, inference, or identification code changed.
Re-review confirms the prior large-module-table omission is fixed. TODO.md:L24-L65 now matches the live diff_diff/*.py >=1000-line inventory.
Re-review confirms the rewritten SE-consistency note now matches the actual vcov_type contract and Registry notes: TODO.md:L220-L222, diff_diff/linalg.py:L1097-L1172, diff_diff/estimators.py:L55-L83, diff_diff/twfe.py:L58-L91, docs/methodology/REGISTRY.md:L2539-L2542.
The refreshed limitation references are accurate: TODO.md:L15-L16, diff_diff/estimators.py:L890-L911, diff_diff/estimators.py:L1251-L1265, diff_diff/estimators.py:L1646-L1648.
The visualization-skip note is accurate against the current test suite: TODO.md:L239-L241, tests/test_visualization_plotly.py:L12-L12, tests/test_visualization_new.py:L19-L19, tests/test_visualization.py:L115-L122.

Methodology

Severity: P3-informational
Impact: The prior SE-consistency docs issue is resolved. The updated prose now reflects the real vcov_type surface and keeps documented deviations/tracked gaps where they belong, so I do not see any new undocumented methodology deviation in the changed scope (TODO.md:L220-L222, docs/methodology/REGISTRY.md:L2539-L2542, docs/methodology/REGISTRY.md:L678-L679).
Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3-informational
Impact: The previous large-module inventory incompleteness is fixed. The refreshed table in TODO.md:L24-L65 matches the current tree exactly, so the new prioritization section is now anchored to a complete source-of-truth table.
Concrete fix: None.

Tech Debt

Severity: P3-informational
Impact: The corrected dCDH survey allocator reference now points at the already-documented survey IF expansion note, which keeps this as tracked technical debt rather than an undocumented deviation (TODO.md:L77-L78, docs/methodology/REGISTRY.md:L678-L679).
Concrete fix: None.

Security

No findings.

Documentation/Tests

Severity: P3-informational
Impact: The refreshed predict() / MultiPeriodDiD wild-bootstrap limitation refs and the pytest.importorskip visualization note are accurate against the live code/tests (TODO.md:L15-L16, diff_diff/estimators.py:L890-L911, diff_diff/estimators.py:L1251-L1265, tests/test_visualization_plotly.py:L12-L12, tests/test_visualization_new.py:L19-L19).
Concrete fix: None.

igerber added the ready-for-ci Triggers CI test workflows label May 15, 2026

igerber merged commit e798d0c into main May 15, 2026
5 of 6 checks passed

igerber deleted the todo-cleanup branch May 15, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog#447

TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog#447
igerber merged 2 commits into
mainfrom
todo-cleanup

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 15, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant