Skip to content

fix: filter transient toast/alert elements from HTML before markdown analysis#1666

Draft
anuj-adobe wants to merge 5 commits into
mainfrom
feat/filter-toast-alert-elements
Draft

fix: filter transient toast/alert elements from HTML before markdown analysis#1666
anuj-adobe wants to merge 5 commits into
mainfrom
feat/filter-toast-alert-elements

Conversation

@anuj-adobe

@anuj-adobe anuj-adobe commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds removal of `[role="alert"]`, `[aria-live="assertive"]`, `[aria-live="polite"]`, and `#toastContainer` elements in `filterHtmlNode` (the Cheerio/Node.js code path)
  • These elements are dynamically injected notification toasts that appear in headless scraped HTML but not in real browsers — typically fired by bot-detection systems (e.g. ThreatMetrix/Signifyd) when headless Chrome is detected
  • Without this fix, transient error toasts like "We are currently experiencing system difficulties" pollute `markdown-diff.md` and affect LLM Optimizer content previews and citations

Root cause

Per MDN, `role="alert"` is a live region reserved for dynamically rendered, time-sensitive messages triggered by user interaction or application events — by definition not static page content. These elements should never be part of content analysis or citations.

Canadian Tire (and likely others) use Signifyd + ThreatMetrix for fraud detection. When headless Chrome is detected, a session update API call fails and a toast with `role="alert"` is injected via JS with the text "We are currently experiencing system difficulties...". The page still returns HTTP 200, so the scraper has no signal that anything went wrong and the toast ends up in the markdown diff.

Since `filterHtmlNode` runs on both server-side and client-side HTML before markdown analysis, stripping these elements at this layer is the correct fix.

Test plan

🤖 Generated with Claude Code

anuj-adobe and others added 2 commits June 11, 2026 18:27
…analysis

JS-injected notification elements (role="alert", aria-live, #toastContainer)
from bot-detection systems (e.g. ThreatMetrix/Signifyd) appear in headless
scraped HTML but not in real browsers. This change removes them before markdown
analysis so they don't pollute content diffs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…and assertive only

aria-live="polite" is used broadly for legitimate dynamic content (search result
counts, filter updates) and would cause false positives. Restrict to role="alert"
and aria-live="assertive" which are W3C live regions reserved for urgent transient
notifications by spec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

This PR will trigger a patch release when merged.

anuj-adobe and others added 3 commits June 11, 2026 18:51
…/alert removal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rser) path too

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…en comments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant