Skip to content

TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog#447

Merged
igerber merged 2 commits into
mainfrom
todo-cleanup
May 15, 2026
Merged

TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog#447
igerber merged 2 commits into
mainfrom
todo-cleanup

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 15, 2026

Summary

Wave 1 of a multi-wave tech-debt paydown. Docs-only — no estimator, test, or build code touched.

  • Delete 3 SHIPPED rows that were left marked Done in the Methodology/Correctness table (HAD trends_lin PR docs: HAD ecosystem completion (RTD audit Batch A) #389, HAD yatchew_hr_test(null="mean_independence") post-PR HAD Phase 4: trends_lin (Eq 17/18) + R-package end-to-end parity #392, HAD T22 tutorial PR Add Tutorial 22: Survey-Weighted HAD walkthrough #440).
  • Fix stray blank line that was splitting the Methodology/Correctness table into two fragments.
  • Correct the dCDH survey cell-period allocator row's PR-column PR 2#408 (the actual dcdh-by-path-survey-design merge per git log).
  • Refresh stale reference sections:
    • Known Limitations line refs for estimators.py (the file grew from 1357 to 1991 lines; predict() is now at :890-911, the wild-bootstrap fallback comment at :1647).
    • Large Module Files regenerated from wc -l diff_diff/*.py ≥1000 lines — adds 11 newly-eligible modules including chaisemartin_dhaultfoeuille.py (8636), had_pretests.py (4951), had.py (4593). Header sentence now makes the 3000 / 2000 / 1000 thresholds explicit instead of silently relaxing the documented < 1000 target.
    • Standard Error Consistency rewritten — vcov_type has subsumed the proposed se_type knob; readers are pointed at the existing methodology row for the remaining 8-standalone-estimator threading work.
    • Test Coverage modernized to reference pytest.importorskip rather than the stale "21 visualization tests" count.
  • Add new ### Prioritized Tech-Debt Backlog subsection: Tier A/B/C/D ordering by effort × risk, anchoring back to existing source-of-truth tables. Seven Tier A quick wins, twelve Tier B mid-size methodology items, nine Tier C derivation items, eight Tier D deferred/research items.

Net diff: +90 / -36 on a single file.

The new tier structure exists so subsequent waves can be picked off without re-litigating priority each turn. Wave 2 candidate = regenerate benchmarks/data/clubsandwich_cr2_golden.json from R (row 87 / Tier A).

Test plan

  • Visual sanity: open TODO.md in a markdown previewer and confirm the Methodology/Correctness table renders as a single contiguous table (not two fragments)
  • grep -c "^| .* | .* | .* | Done |" TODO.md → 0
  • grep -nE "estimators\.py:778-784|estimators\.py:567-588|\| PR 2 +\|" TODO.md → no matches
  • grep -nE "\| #408 \|" TODO.md → ≥1 match
  • grep -nE "^#### Tier [ABCD] —" TODO.md → 4 matches
  • git diff --name-only main..HEAD → only TODO.md

🤖 Generated with Claude Code

…backlog

Wave 1 of multi-wave tech-debt paydown. Docs-only — no estimator code touched.

Changes:
- Delete 3 SHIPPED rows marked `Done`: HAD trends_lin (PR #389),
  HAD yatchew_hr_test(null="mean_independence") (post-PR #392),
  HAD Phase 5 follow-up tutorial T22 (PR #440).
- Fix stray blank line inside Methodology/Correctness table (was rendering
  as two fragments).
- Correct dCDH survey cell-period allocator row's PR-column "PR 2" → "#408"
  (actual dcdh-by-path-survey-design merge).
- Refresh Known Limitations line refs to current state of estimators.py
  (`:778-784` → `:1647`; `:567-588` → `:890-911`).
- Regenerate Large Module Files table from `wc -l diff_diff/*.py` ≥1000 lines.
  Add 11 newly-eligible modules (chaisemartin_dhaultfoeuille.py 8636,
  had_pretests.py 4951, had.py 4593, etc.). Update header sentence to make
  the 3000/2000/1000 thresholds explicit instead of silently relaxing the
  documented "< 1000 lines" target.
- Rewrite Standard Error Consistency section — `vcov_type` has subsumed the
  proposed `se_type` knob; point readers at the existing methodology row
  for the remaining 8-standalone-estimator threading work.
- Modernize Test Coverage section to reference `pytest.importorskip` rather
  than a stale "21 visualization tests" count.
- Add new `### Prioritized Tech-Debt Backlog` subsection: Tier A/B/C/D
  ordering by effort × risk, anchoring back to existing source-of-truth
  table rows so the tables remain canonical. Seven Tier A quick wins,
  twelve Tier B mid-size methodology items, nine Tier C derivation items,
  eight Tier D deferred/research items.

The new tier structure exists so subsequent waves can be picked off
without re-litigating priority each turn. Wave 2 candidate is the
clubsandwich_cr2_golden.json regeneration from R (row 87 / Tier A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no P0/P1 findings. The diff is docs-only, and I did not find any estimator-behavior change, undocumented methodology deviation, or statistical anti-pattern introduced by this PR. I did find two P2 documentation/tech-debt tracking issues in the new TODO.md content.

Executive Summary

  • No estimator, weighting, inference, or identification code changed, so there is no source-material blocker in the implementation.
  • P2: the refreshed “Large Module Files” table is incomplete and omits eight current >=1000-line modules.
  • P2: the rewritten SE-consistency note misstates the current public interface by treating cr1, cr2, and bootstrap as part of vcov_type.
  • The refreshed predict() reference and the visualization test-skip note appear accurate against the current tree.
  • No security or performance regressions are introduced by this diff.

Methodology

  • P2 Impact: The SE-consistency rewrite in TODO.md:L212-L214 does not match the actual variance API. vcov_type currently resolves to {"classical","hc1","hc2","hc2_bm","conley"} in diff_diff/linalg.py:L1202-L1221, hc2 is rejected on clustered fits in diff_diff/linalg.py:L1133-L1141, and bootstrap is configured separately via inference / n_bootstrap in diff_diff/estimators.py:L168-L177. The new text makes it sound like cr1, cr2, and bootstrap are vcov_type labels, which could misdirect future SE-threading work. Concrete fix: rewrite the note to reflect the actual API, e.g. vcov_type ∈ {classical,hc1,hc2,hc2_bm,conley}, with clustered hc1 yielding CR1 behavior and wild bootstrap remaining a separate inference="wild_bootstrap" path.

Code Quality

  • No findings in changed scope.

Performance

  • No findings in changed scope.

Maintainability

  • P2 Impact: The regenerated large-module inventory in TODO.md:L24-L56 is presented as the current state, but it omits eight live diff_diff/*.py files already above the stated >=1000-line threshold: _nprobust_port.py (1412), practitioner.py (1402), trop_global.py (1350), trop_local.py (1339), local_linear.py (1332), wooldridge.py (1305), chaisemartin_dhaultfoeuille_bootstrap.py (1175), and stacked_did.py (1050). That underreports the current maintainability surface and weakens the new tiering section’s claim to be drawing from up-to-date source-of-truth tables. Concrete fix: regenerate the table mechanically from wc -l diff_diff/*.py and include every >=1000 file, or explicitly narrow the section’s scope/title so those omissions are intentional and documented.

Tech Debt

  • No additional findings. The new Tier A/B/C/D backlog is internally anchored to existing TODO rows.

Security

  • No findings.

Documentation/Tests

  • No additional findings. The refreshed predict() limitation reference and the pytest.importorskip visualization-skip note match the current tree.

P2 — SE-consistency rewrite misstated the public API
  The rewrite listed cr1, cr2, bootstrap as vcov_type values. The actual
  validated set in linalg.py::_VALID_VCOV_TYPES is
  {"classical","hc1","hc2","hc2_bm","conley"}; cluster-robust variance is
  obtained via cluster= alongside the heteroscedasticity kind (hc1+cluster
  ⇒ CR1, hc2_bm+cluster ⇒ CR2 Bell-McCaffrey), and wild cluster bootstrap
  is a separate inference="wild_bootstrap" path on the same estimator.
  Rewrote the SE Consistency paragraph to match the actual API.

P2 — Large Module Files table omitted 8 modules already ≥1000 lines
  The refreshed inventory in section 24-56 missed: _nprobust_port.py
  (1412), practitioner.py (1402), trop_global.py (1350), trop_local.py
  (1339), local_linear.py (1332), wooldridge.py (1305),
  chaisemartin_dhaultfoeuille_bootstrap.py (1175), stacked_did.py (1050).
  Mechanically regenerated from wc -l diff_diff/*.py >= 1000; all 35
  current ≥1000-line modules now listed (verified via comm).

Self-audit fix
  linalg.py Action cell read "Consider splitting (vcov surfaces) — unified
  backend, splitting would hurt cohesion" — self-contradictory. Reworded to
  "Consider splitting only if cohesion can be preserved" so the threshold
  rule and the cohesion constraint can both be honored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4e829a28db1049678abc7c8c5517e4b8cef04925


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed scope.

Executive Summary

  • Review scope is docs-only: TODO.md; no estimator, weighting, inference, or identification code changed.
  • Re-review confirms the prior large-module-table omission is fixed. TODO.md:L24-L65 now matches the live diff_diff/*.py >=1000-line inventory.
  • Re-review confirms the rewritten SE-consistency note now matches the actual vcov_type contract and Registry notes: TODO.md:L220-L222, diff_diff/linalg.py:L1097-L1172, diff_diff/estimators.py:L55-L83, diff_diff/twfe.py:L58-L91, docs/methodology/REGISTRY.md:L2539-L2542.
  • The refreshed limitation references are accurate: TODO.md:L15-L16, diff_diff/estimators.py:L890-L911, diff_diff/estimators.py:L1251-L1265, diff_diff/estimators.py:L1646-L1648.
  • The visualization-skip note is accurate against the current test suite: TODO.md:L239-L241, tests/test_visualization_plotly.py:L12-L12, tests/test_visualization_new.py:L19-L19, tests/test_visualization.py:L115-L122.

Methodology

  • Severity: P3-informational
  • Impact: The prior SE-consistency docs issue is resolved. The updated prose now reflects the real vcov_type surface and keeps documented deviations/tracked gaps where they belong, so I do not see any new undocumented methodology deviation in the changed scope (TODO.md:L220-L222, docs/methodology/REGISTRY.md:L2539-L2542, docs/methodology/REGISTRY.md:L678-L679).
  • Concrete fix: None.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3-informational
  • Impact: The previous large-module inventory incompleteness is fixed. The refreshed table in TODO.md:L24-L65 matches the current tree exactly, so the new prioritization section is now anchored to a complete source-of-truth table.
  • Concrete fix: None.

Tech Debt

  • Severity: P3-informational
  • Impact: The corrected dCDH survey allocator reference now points at the already-documented survey IF expansion note, which keeps this as tracked technical debt rather than an undocumented deviation (TODO.md:L77-L78, docs/methodology/REGISTRY.md:L678-L679).
  • Concrete fix: None.

Security

  • No findings.

Documentation/Tests

  • Severity: P3-informational
  • Impact: The refreshed predict() / MultiPeriodDiD wild-bootstrap limitation refs and the pytest.importorskip visualization note are accurate against the live code/tests (TODO.md:L15-L16, diff_diff/estimators.py:L890-L911, diff_diff/estimators.py:L1251-L1265, tests/test_visualization_plotly.py:L12-L12, tests/test_visualization_new.py:L19-L19).
  • Concrete fix: None.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 15, 2026
@igerber igerber merged commit e798d0c into main May 15, 2026
5 of 6 checks passed
@igerber igerber deleted the todo-cleanup branch May 15, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant