TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog#447
Merged
Conversation
…backlog Wave 1 of multi-wave tech-debt paydown. Docs-only — no estimator code touched. Changes: - Delete 3 SHIPPED rows marked `Done`: HAD trends_lin (PR #389), HAD yatchew_hr_test(null="mean_independence") (post-PR #392), HAD Phase 5 follow-up tutorial T22 (PR #440). - Fix stray blank line inside Methodology/Correctness table (was rendering as two fragments). - Correct dCDH survey cell-period allocator row's PR-column "PR 2" → "#408" (actual dcdh-by-path-survey-design merge). - Refresh Known Limitations line refs to current state of estimators.py (`:778-784` → `:1647`; `:567-588` → `:890-911`). - Regenerate Large Module Files table from `wc -l diff_diff/*.py` ≥1000 lines. Add 11 newly-eligible modules (chaisemartin_dhaultfoeuille.py 8636, had_pretests.py 4951, had.py 4593, etc.). Update header sentence to make the 3000/2000/1000 thresholds explicit instead of silently relaxing the documented "< 1000 lines" target. - Rewrite Standard Error Consistency section — `vcov_type` has subsumed the proposed `se_type` knob; point readers at the existing methodology row for the remaining 8-standalone-estimator threading work. - Modernize Test Coverage section to reference `pytest.importorskip` rather than a stale "21 visualization tests" count. - Add new `### Prioritized Tech-Debt Backlog` subsection: Tier A/B/C/D ordering by effort × risk, anchoring back to existing source-of-truth table rows so the tables remain canonical. Seven Tier A quick wins, twelve Tier B mid-size methodology items, nine Tier C derivation items, eight Tier D deferred/research items. The new tier structure exists so subsequent waves can be picked off without re-litigating priority each turn. Wave 2 candidate is the clubsandwich_cr2_golden.json regeneration from R (row 87 / Tier A). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P2 — SE-consistency rewrite misstated the public API
The rewrite listed cr1, cr2, bootstrap as vcov_type values. The actual
validated set in linalg.py::_VALID_VCOV_TYPES is
{"classical","hc1","hc2","hc2_bm","conley"}; cluster-robust variance is
obtained via cluster= alongside the heteroscedasticity kind (hc1+cluster
⇒ CR1, hc2_bm+cluster ⇒ CR2 Bell-McCaffrey), and wild cluster bootstrap
is a separate inference="wild_bootstrap" path on the same estimator.
Rewrote the SE Consistency paragraph to match the actual API.
P2 — Large Module Files table omitted 8 modules already ≥1000 lines
The refreshed inventory in section 24-56 missed: _nprobust_port.py
(1412), practitioner.py (1402), trop_global.py (1350), trop_local.py
(1339), local_linear.py (1332), wooldridge.py (1305),
chaisemartin_dhaultfoeuille_bootstrap.py (1175), stacked_did.py (1050).
Mechanically regenerated from wc -l diff_diff/*.py >= 1000; all 35
current ≥1000-line modules now listed (verified via comm).
Self-audit fix
linalg.py Action cell read "Consider splitting (vcov surfaces) — unified
backend, splitting would hurt cohesion" — self-contradictory. Reworded to
"Consider splitting only if cohesion can be preserved" so the threshold
rule and the cohesion constraint can both be honored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave 1 of a multi-wave tech-debt paydown. Docs-only — no estimator, test, or build code touched.
Donein the Methodology/Correctness table (HADtrends_linPR docs: HAD ecosystem completion (RTD audit Batch A) #389, HADyatchew_hr_test(null="mean_independence")post-PR HAD Phase 4: trends_lin (Eq 17/18) + R-package end-to-end parity #392, HAD T22 tutorial PR Add Tutorial 22: Survey-Weighted HAD walkthrough #440).PR 2→#408(the actualdcdh-by-path-survey-designmerge pergit log).Known Limitationsline refs forestimators.py(the file grew from 1357 to 1991 lines;predict()is now at:890-911, the wild-bootstrap fallback comment at:1647).Large Module Filesregenerated fromwc -l diff_diff/*.py≥1000 lines — adds 11 newly-eligible modules includingchaisemartin_dhaultfoeuille.py(8636),had_pretests.py(4951),had.py(4593). Header sentence now makes the 3000 / 2000 / 1000 thresholds explicit instead of silently relaxing the documented< 1000target.Standard Error Consistencyrewritten —vcov_typehas subsumed the proposedse_typeknob; readers are pointed at the existing methodology row for the remaining 8-standalone-estimator threading work.Test Coveragemodernized to referencepytest.importorskiprather than the stale "21 visualization tests" count.### Prioritized Tech-Debt Backlogsubsection: Tier A/B/C/D ordering by effort × risk, anchoring back to existing source-of-truth tables. Seven Tier A quick wins, twelve Tier B mid-size methodology items, nine Tier C derivation items, eight Tier D deferred/research items.Net diff: +90 / -36 on a single file.
The new tier structure exists so subsequent waves can be picked off without re-litigating priority each turn. Wave 2 candidate = regenerate
benchmarks/data/clubsandwich_cr2_golden.jsonfrom R (row 87 / Tier A).Test plan
TODO.mdin a markdown previewer and confirm the Methodology/Correctness table renders as a single contiguous table (not two fragments)grep -c "^| .* | .* | .* | Done |" TODO.md→ 0grep -nE "estimators\.py:778-784|estimators\.py:567-588|\| PR 2 +\|" TODO.md→ no matchesgrep -nE "\| #408 \|" TODO.md→ ≥1 matchgrep -nE "^#### Tier [ABCD] —" TODO.md→ 4 matchesgit diff --name-only main..HEAD→ onlyTODO.md🤖 Generated with Claude Code