Skip to content

Expand LaTeX symbol coverage using codegen from vendored tables#7

Open
chitwitgit wants to merge 3 commits into
md2docx:mainfrom
chitwitgit:feat/expand-latex-symbol-coverage
Open

Expand LaTeX symbol coverage using codegen from vendored tables#7
chitwitgit wants to merge 3 commits into
md2docx:mainfrom
chitwitgit:feat/expand-latex-symbol-coverage

Conversation

@chitwitgit

@chitwitgit chitwitgit commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Closes #6.

Replaces the hand-maintained LATEX_SYMBOLS map (~120 entries) in lib/src/index.ts with generated symbol tables to support substantially more LaTeX commands in DOCX output.

KaTeX v0.16.22 source snippets are used as a practical seed — they provide a large, well-structured baseline that is easy to vendor and codegen from. The goal is broad LaTeX command coverage, not parity with any particular renderer; other sources and manual overrides can be layered on later.

  • 567 symbol mappings, 52 aliases, 43 named functions, 21 macro-only overrides
  • Accent commands (\tilde, \bar, \vec, etc.) now produce OMML accent characters
  • Named functions (\sin, \log, etc.) render as literal text
  • Adds pnpm generate:katex to regenerate tables from vendored source
  • Narrows tsup entry to ./src/index.ts so generated data files bundle into the main export

Test plan

  • pnpm typecheck passes in lib/
  • pnpm build passes in lib/
  • pnpm test passes in lib/
  • Spot-check DOCX output for common symbols (\neq, \alpha, \sum, \hat{x})

Replace the hand-maintained symbol map with generated tables seeded from
KaTeX v0.16.22 source snippets, add accent and function handling, and
document regeneration via pnpm generate:katex.

Fixes md2docx#6
@deepsource-io

deepsource-io Bot commented Jun 9, 2026

Copy link
Copy Markdown

DeepSource Code Review

We reviewed changes in 7577e6f...6a09b9e on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
JavaScript Jun 10, 2026 2:19a.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

@chitwitgit

chitwitgit commented Jun 9, 2026

Copy link
Copy Markdown
Author

Summary

Follow-up to #7.

PR #7 replaces the hand-maintained symbol map with KaTeX-generated tables (~567 symbols). That covers most macros where the fix is simply resolving a command to the right Unicode glyph (e.g. \wedge).

The port still had gaps: some common macros need more than symbol lookup. N-ary operators must emit m:nary, not a ∫/∏ character in a text run. Layout commands like \binom and \stackrel need delimiter and limit structures. Font wrappers should render their argument, not the macro name. A few KaTeX macro-only symbols (\quad, \ne, \cdots) were also missing from the generated overrides.

This PR adds explicit handlers for those cases (~115 lines on top of #7).

What changed

lib/src/index.ts (+~110 lines)

Category Macros OMML
N-ary operators \prod, \int, \oint, \bigcup, \bigcap, \bigoplus, \bigotimes m:nary
Layout \binom{a}{b} m:d (round brackets + fraction)
Layout \stackrel{a}{b} m:limLoc + MathLimitUpper
Accents \overline, \widetilde createMathAccentCharacter
Font/text \mathrm, \mathit, \textbf, \textit, \underline, \overbrace, \underbrace argument content only (no literal macro name)
Skip \boxed, \boldsymbol no literal fallback
Other \newline space (non-empty OMML)

\sum unchanged — already handled via MathSum.

lib/scripts/generate-katex-data.ts (+4 lines)

Codegen fixes for macro-only symbols missed by simple alias parsing:

  • \quad, \qquad — fixed \\\\hskip regex for vendored KaTeX file
  • \ne (alias of \neq)
  • \cdots (via \@cdots)

Regenerated katexMeta.ts: 21 → 25 KATEX_SYMBOL_OVERRIDES.

Test plan

  • pnpm build passes in lib/
  • pnpm test passes in lib/
  • Spot-check in Word: \binom{n}{k}, \int_0^1, \stackrel{def}{=}, \mathrm{ABC}, \prod_{i=1}^n

@chitwitgit chitwitgit force-pushed the feat/expand-latex-symbol-coverage branch from feae6c4 to 977dd65 Compare June 9, 2026 08:46
Map n-ary operators, binom, stackrel, accents, and font wrappers to proper
Word OMML instead of Unicode fallbacks; fix quad/ne/cdots codegen overrides.
@chitwitgit chitwitgit force-pushed the feat/expand-latex-symbol-coverage branch from 977dd65 to f05e5b5 Compare June 9, 2026 08:50
Log console errors and omit unrenderable OMML instead of emitting empty <m:oMath> elements that break Microsoft Word.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand LaTeX symbol coverage beyond hand-maintained map

1 participant