F150: Sanitize Word/Office paste cruft + Clear-formatting button#3
Merged
Conversation
…ton (F150) Pasting from Word/Office leaked empty <span>/mso-*/<o:p>/<font> cruft into stored Markdown (the editor serialises with html:true), surfacing as literal <span> text on public sites (ground truth: sanneandersen "fordjelsen"). Add a pure, string-based sanitizer wired into the editor's transformPastedHTML so every paste is cleaned before ProseMirror parses — fixing it at source for every CMS site. Also add a Clear-formatting toolbar button (unsetAllMarks + clearNodes) for explicit on-demand stripping. - src/lib/paste-sanitizer.ts: sanitizeWordPasteHtml (no DOM dep, node-testable) - 17 unit tests incl. the real prod payload + edge guards (color spans kept) - Btn gains optional testId; clear-formatting-button data-testid (Lens-ready) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Source-fix for Word/Office paste pollution in the richtext editor (epic cms-F150, story cms-F150.1).
Pasting from Word/Office leaked empty
<span>/mso-*/<o:p>/<font>cruft into stored Markdown (the editor serialises withhtml:true). On public sites that render the Markdown without raw-HTML support it escapes and shows as literal<span>text on the page. Ground truth: sanneandersen.dk product "fordjelsen".How
src/lib/paste-sanitizer.ts—sanitizeWordPasteHtml(html), a pure string transform (no DOM dependency → runs in the browser paste path AND is unit-testable in the repo'snodevitest env). Strips Office conditional comments,<style>/<xml>blocks, XML-namespace tags,<font>,mso-*style props,class="Mso…", and unwraps noise-only<span>wrappers — while preserving real content, intentional markup (<u>,<strong>,<a>), and deliberate colour spans (TextStyle/Color).rich-text-editor.tsxeditorProps.transformPastedHTML→ every paste is cleaned before ProseMirror parses.unsetAllMarks+clearNodes),data-testid="clear-formatting-button"(Lens-ready).Btngains an optionaltestIdprop (additive).Verification
src/lib/__tests__/paste-sanitizer.test.tsincl. the real prod payload + edge guards.tsc --noEmit: no new errors (my files clean; pre-existing mcp/webhook-dispatch errors untouched).Plan-doc:
docs/features/F150-paste-formatting-sanitization.md(already on main via cardmem).🤖 Generated with Claude Code