fix: filter transient toast/alert elements from HTML before markdown analysis by anuj-adobe · Pull Request #1666 · adobe/spacecat-shared

anuj-adobe · 2026-06-11T12:57:23Z

Summary

Adds removal of `[role="alert"]`, `[aria-live="assertive"]`, `[aria-live="polite"]`, and `#toastContainer` elements in `filterHtmlNode` (the Cheerio/Node.js code path)
These elements are dynamically injected notification toasts that appear in headless scraped HTML but not in real browsers — typically fired by bot-detection systems (e.g. ThreatMetrix/Signifyd) when headless Chrome is detected
Without this fix, transient error toasts like "We are currently experiencing system difficulties" pollute `markdown-diff.md` and affect LLM Optimizer content previews and citations

Root cause

Per MDN, `role="alert"` is a live region reserved for dynamically rendered, time-sensitive messages triggered by user interaction or application events — by definition not static page content. These elements should never be part of content analysis or citations.

Canadian Tire (and likely others) use Signifyd + ThreatMetrix for fraud detection. When headless Chrome is detected, a session update API call fails and a toast with `role="alert"` is injected via JS with the text "We are currently experiencing system difficulties...". The page still returns HTTP 200, so the scraper has no signal that anything went wrong and the toast ends up in the markdown diff.

Since `filterHtmlNode` runs on both server-side and client-side HTML before markdown analysis, stripping these elements at this layer is the correct fix.

Test plan

Added test: `should remove toast and live-region alert elements` covering `role="alert"`, `aria-live="assertive"`, `aria-live="polite"`, and `#toastContainer`
78 tests passing
Verified locally against `https://www.canadiantire.ca/en/cat/sports-recreation/bikes-accessories/bikes-DC0002129.html\` — toast text no longer appears in `markdown-diff.md` or `server-side-html.md`

🤖 Generated with Claude Code

…analysis JS-injected notification elements (role="alert", aria-live, #toastContainer) from bot-detection systems (e.g. ThreatMetrix/Signifyd) appear in headless scraped HTML but not in real browsers. This change removes them before markdown analysis so they don't pollute content diffs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…and assertive only aria-live="polite" is used broadly for legitimate dynamic content (search result counts, filter updates) and would cause false positives. Restrict to role="alert" and aria-live="assertive" which are W3C live regions reserved for urgent transient notifications by spec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-11T13:20:44Z

This PR will trigger a patch release when merged.

…/alert removal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rser) path too Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…en comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

anuj-adobe and others added 2 commits June 11, 2026 18:27

anuj-adobe and others added 3 commits June 11, 2026 18:51

refactor: extract TRANSIENT_NOTIFICATION_SELECTORS constant for toast…

69a7a96

…/alert removal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: apply TRANSIENT_NOTIFICATION_SELECTORS removal in browser (DOMPa…

4f872b9

…rser) path too Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: apply TRANSIENT_NOTIFICATION_SELECTORS in browser path and tight…

de5067d

…en comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: filter transient toast/alert elements from HTML before markdown analysis#1666

fix: filter transient toast/alert elements from HTML before markdown analysis#1666
anuj-adobe wants to merge 5 commits into
mainfrom
feat/filter-toast-alert-elements

anuj-adobe commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

anuj-adobe commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Test plan

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anuj-adobe commented Jun 11, 2026 •

edited

Loading