Skip to content

fix: simplify directive scanner + handle JSX quotes#240

Closed
mogelbrod wants to merge 1 commit into
lingui:mainfrom
mogelbrod:simple-directive-scanner
Closed

fix: simplify directive scanner + handle JSX quotes#240
mogelbrod wants to merge 1 commit into
lingui:mainfrom
mogelbrod:simple-directive-scanner

Conversation

@mogelbrod

Copy link
Copy Markdown
Contributor

Alternative solution to #238 which sacrifices correctness for simplicity by using a regex scanning approach. The caveat is that lingui-set and lingui-reset directives within strings will be seen as valid directives and thus be applied. @timofei-iatsenko considered this to be an acceptable compromise (comment).

@mogelbrod mogelbrod changed the title fix: make directive scanner handle quotes in JSX correctly fix: simplift directive scanner + handle JSX quotes Jun 22, 2026
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.07843% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.55%. Comparing base (300cf86) to head (cdde0e7).

Files with missing lines Patch % Lines
crates/lingui_macro/src/comment_directive/mod.rs 96.07% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #240      +/-   ##
==========================================
- Coverage   94.95%   94.55%   -0.40%     
==========================================
  Files          10        9       -1     
  Lines        2597     2371     -226     
==========================================
- Hits         2466     2242     -224     
+ Misses        131      129       -2     
Flag Coverage Δ
unittests 94.55% <96.07%> (-0.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crates/lingui_macro/src/comment_directive/mod.rs 94.56% <96.07%> (+0.58%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mogelbrod mogelbrod changed the title fix: simplift directive scanner + handle JSX quotes fix: simplify directive scanner + handle JSX quotes Jun 22, 2026
@timofei-iatsenko

Copy link
Copy Markdown
Collaborator

Need to test how much this regex is making a transform slower. I'm using benchmark from this branch lingui/js-lingui#2560

And manually adding an additional scenario into benchmarks/src/scenarios/macro-transform.bench.ts

  bench.add("SWC - customized", async () => {
    await Promise.all(
      sourceFiles.map(async ({ filename, code }) => {
        const isTsx = filename.endsWith(".tsx")
        await swc.transform(code, {
          filename,
          jsc: {
            target: "esnext",
            transform: {
              react: {
                runtime: "preserve",
              },
            },
            parser: isTsx
              ? { syntax: "typescript", tsx: true }
              : { syntax: "typescript", tsx: false },
            experimental: {
              plugins: [
                [
                  "/Users/timofei.Iatsenko/Projects/swc-plugin/target/wasm32-wasip1/release/lingui_macro_plugin.wasm",
                  { descriptorFields: "all" } as Record<string, unknown>,
                ],
              ],
            },
          },
        })
      }),
    )
  })

Could you test this version in comparison to what is latest published in npm? (the branch may have not the latest)

@mogelbrod

mogelbrod commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

I asked Claude to reproduce the macro-transform benchmark to measure the regex's cost.

The regex introduces a likely correctness regression and a ~12% transform slowdown.

Claude suggests to keep the custom scanner implementation:

The regex swap regresses both correctness and speed. The per-file instantiation model of WASM plugins is pathological for the regex crate — its one advantage (simplicity) can't amortize its compile cost or binary weight. I'd keep the source_scanner, or if the regex stays, at minimum fix the comment-boundary handling (don't capture past */) and weigh the +1.3 MB binary cost.

Setup

Three plugin .wasm builds, benchmarked via @swc/core@1.15.33 over a verbatim port of the lingui/js-lingui#2560 fixture generators (medium preset: 1000 files × 10 msgs, ~20% carrying lingui-set/lingui-reset directives), descriptorFields: "all", interleaved/shuffled rounds, output verified identical:

  • npm 6.4.0 — published @lingui/swc-plugin (source_scanner)
  • main300cf86 built from source (source_scanner)
  • branchcdde0e7 built from source (DIRECTIVE_RE regex)

Building main and the branch with the same local toolchain makes main → branch the clean isolation of the regex commit. npm 6.4.0 is a separate CI build (different opt flags) + 6 commits behind, so it's only a "what's published" reference, not the control.

🔴 Correctness regression (surfaced by the benchmark fixtures)

The branch fails to compile block-comment directives inside a JSX expression container — {/* lingui-set … */} / {/* lingui-reset */} — which main and npm 6.4.0 both handle:

x `lingui-reset` directive has invalid syntax: lingui-reset */}<Trans>…

{/* … */} is the only valid way to write a block-comment directive inside JSX, so this breaks directives in .tsx files. Root cause: the regex's ([^\n]*) group captures to end-of-line, and the trailing-*/ strip only works when */ is literally the last thing on the line — so the } (and any same-line code after a /* … */) gets swallowed into the params. The branch's tests only cover /* … */ standalone in JS context, so this slipped through. I had to strip 34 such lines from the .tsx fixtures just to get an apples-to-apples perf run.

🟡 Performance — 1000 files × 10 msgs, median of 30 interleaved rounds

plugin median vs main vs npm
npm 6.4.0 (scanner) 164.8 ms
main (scanner) 172.4 ms
branch (regex) 193.7 ms +12.3% +17.5%

The cost is per-file fixed overhead, not proportional to source size: re-running with 100 files × 100 msgs (same total content, 1/10th the files) drops the regex penalty to +4.5%. Two per-file contributors:

  • WASM binary grows +69.6% (1.88 MB → 3.18 MB) from the regex crate → slower module instantiation, which SWC does per file.
  • LazyLock<Regex> recompiles the regex once per WASM instance (i.e. once per file); Regex::new is expensive. The hand-written scanner had zero compile cost and no binary bloat.

Real projects resemble the "medium" preset (many small/medium files), so ~12% is the representative figure; it only shrinks for unusually few, huge files.

@timofei-iatsenko

timofei-iatsenko commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

What about doing this without regex, by just using a manual scanner?

  1. locate lingui-set/lingui-reset with simple substring match
  2. start matching char by char until it find the end of the comment

Should be much simplier and without using regexp. The current direction seems very complex as we tried to literally rewrite js parser by ourselves.

@timofei-iatsenko

Copy link
Copy Markdown
Collaborator

closed in favor of #242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants