Skip to content

F150: Sanitize Word/Office paste cruft + Clear-formatting button#3

Merged
cbroberg merged 1 commit into
mainfrom
feat/f150-paste-sanitization
Jun 7, 2026
Merged

F150: Sanitize Word/Office paste cruft + Clear-formatting button#3
cbroberg merged 1 commit into
mainfrom
feat/f150-paste-sanitization

Conversation

@cbroberg
Copy link
Copy Markdown
Contributor

@cbroberg cbroberg commented Jun 7, 2026

What

Source-fix for Word/Office paste pollution in the richtext editor (epic cms-F150, story cms-F150.1).

Pasting from Word/Office leaked empty <span> / mso-* / <o:p> / <font> cruft into stored Markdown (the editor serialises with html:true). On public sites that render the Markdown without raw-HTML support it escapes and shows as literal <span> text on the page. Ground truth: sanneandersen.dk product "fordjelsen".

How

  • src/lib/paste-sanitizer.tssanitizeWordPasteHtml(html), a pure string transform (no DOM dependency → runs in the browser paste path AND is unit-testable in the repo's node vitest env). Strips Office conditional comments, <style>/<xml> blocks, XML-namespace tags, <font>, mso-* style props, class="Mso…", and unwraps noise-only <span> wrappers — while preserving real content, intentional markup (<u>, <strong>, <a>), and deliberate colour spans (TextStyle/Color).
  • Wired into rich-text-editor.tsx editorProps.transformPastedHTML → every paste is cleaned before ProseMirror parses.
  • Clear-formatting toolbar button (unsetAllMarks + clearNodes), data-testid="clear-formatting-button" (Lens-ready). Btn gains an optional testId prop (additive).

Verification

  • 17 unit tests in src/lib/__tests__/paste-sanitizer.test.ts incl. the real prod payload + edge guards.
  • Full cms-admin suite green: 857/857.
  • tsc --noEmit: no new errors (my files clean; pre-existing mcp/webhook-dispatch errors untouched).
  • ⏳ Not yet clicked-through in-browser (live paste round-trip) — Chrome DevTools port was down. Logic is unit-proven on the exact prod payload.

Plan-doc: docs/features/F150-paste-formatting-sanitization.md (already on main via cardmem).

🤖 Generated with Claude Code

…ton (F150)

Pasting from Word/Office leaked empty <span>/mso-*/<o:p>/<font> cruft into
stored Markdown (the editor serialises with html:true), surfacing as literal
<span> text on public sites (ground truth: sanneandersen "fordjelsen"). Add a
pure, string-based sanitizer wired into the editor's transformPastedHTML so
every paste is cleaned before ProseMirror parses — fixing it at source for
every CMS site. Also add a Clear-formatting toolbar button (unsetAllMarks +
clearNodes) for explicit on-demand stripping.

- src/lib/paste-sanitizer.ts: sanitizeWordPasteHtml (no DOM dep, node-testable)
- 17 unit tests incl. the real prod payload + edge guards (color spans kept)
- Btn gains optional testId; clear-formatting-button data-testid (Lens-ready)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cbroberg cbroberg merged commit 4985d9a into main Jun 7, 2026
6 of 7 checks passed
@cbroberg cbroberg deleted the feat/f150-paste-sanitization branch June 7, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant