Skip to content

fix(utils): added some patterns for aem_headless detection#1716

Open
anshulk-public wants to merge 6 commits into
mainfrom
feat/headless-patterns
Open

fix(utils): added some patterns for aem_headless detection#1716
anshulk-public wants to merge 6 commits into
mainfrom
feat/headless-patterns

Conversation

@anshulk-public

@anshulk-public anshulk-public commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add GraphQL persisted query as a scored headless signal (/graphql/execute.json → +2 bonus), guarded against hybrid AEM CS sites that embed GraphQL endpoints while AEM still server-renders the page (detected via lc-hash clientlibs: /etc.clientlibs/foo.lc-[hash]-lc.min.css)
  • Add Sling Model Exporter path as a scored headless signal (/content/*/path.model.json → +2 bonus), guarded against AEM CS SPA Editor sites (lc-hash clientlibs) and AEM AMS sites (MD5-hash clientlibs: /etc.clientlibs/foo.min.[32-char-hex].css)

Guards

lc-hash clientlibs (/etc.clientlibs/foo.lc-[hash]-lc.min.css) are emitted exclusively by the AEM CS rendering pipeline — unforgeable proof AEM is server-rendering the page.

MD5-hash clientlibs (/etc.clientlibs/foo.min.[32-char-hex].css) are the AMS fingerprint format — presence rules out headless for the .model.json signal.

Test plan

  • npm test -w packages/spacecat-shared-utils passes (1180 tests)
  • Updated test: publish-p URL alone now correctly returns other (not aem_headless)
  • GraphQL and Sling Model Exporter detection tests pass
  • lc-hash guard tests pass

Add GraphQL persisted query and Sling Model Exporter as scored headless
signals with guards against hybrid CS sites (lc-hash clientlibs) and
AMS SPA Editor sites (MD5-hash clientlibs). Remove publish-p URL as a
headless bonus signal — asset delivery from AEM publish tier is not an
indicator of headless architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@anshulk-public anshulk-public changed the title fix(utils): reduce false positives in aem_headless detection fix(utils): added some patterns for aem_headless detection Jun 24, 2026
@github-actions

Copy link
Copy Markdown

This PR will trigger a patch release when merged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anshulk-public,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Request changes - guard logic is defeated by unconditional array scoring.
Complexity: HIGH - medium diff; dependency signal.
Changes: Adds GraphQL persisted query and Sling Model Exporter as scored headless detection signals with guards against hybrid AEM CS/AMS sites (3 files).
Note: CI checks are currently failing - resolve before merge.

Must fix before merge

  1. [Important] Double-counting defeats guard intent - patterns scored unconditionally in array AND conditionally in guarded blocks - packages/spacecat-shared-utils/src/aem.js:114 (details inline)
  2. [Important] Missing guard-suppression tests - no test verifies the guards actually prevent false positives - packages/spacecat-shared-utils/test/aem.test.js:353 (details inline)
Non-blocking (2): minor issues and suggestions
  • nit: No test exercises the .model.tidy.json regex variant ((?:\.tidy)? group is untested) - packages/spacecat-shared-utils/test/aem.test.js:345
  • nit: Test name "should NOT detect AEM Headless via publish-p URL alone" suggests a publish-p guard was added, but detection fails only because a single /content/dam/ match (score=1) is below threshold (2); consider renaming to reflect the actual mechanism - packages/spacecat-shared-utils/test/aem.test.js:310

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 0s | Cost: $4.53 | Commit: fd8d4d79d266774104005e088edee30f418f1050
If this code review was useful, please react with 👍. Otherwise, react with 👎.

Comment thread packages/spacecat-shared-utils/src/aem.js Outdated
Comment thread packages/spacecat-shared-utils/test/aem.test.js
@MysticatBot MysticatBot added ai-reviewed Reviewed by AI complexity:high High complexity PR labels Jun 24, 2026
anshulk-public and others added 2 commits June 24, 2026 14:13
Remove /graphql/execute.json and /content/...model.json from the
aemHeadlessPatterns loop so they do not score unconditionally. The
loop scoring bypassed the guards entirely — on a hybrid page with
both signals the headless score reached MIN_THRESHOLD from the loop
alone before the guarded blocks were even evaluated.

Bump the guarded bonuses from +1 to +2 so each signal is
self-sufficient against MIN_THRESHOLD when the guard does not fire.

Add three negative-path tests to lock in guard regression safety:
- GraphQL + lc-hash clientlibs -> not headless
- model.json + lc-hash clientlibs -> not headless
- model.json + MD5-hash clientlibs -> not headless

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anshulk-public,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Approve - prior blocking findings fully addressed, guards are now effective and tested.
Complexity: MEDIUM - medium diff, single domain.
Changes: Moves GraphQL and Sling Model Exporter patterns from unconditional scoring to guarded blocks (+2 each), adds guard-suppression tests for hybrid CS/AMS sites (2 files).

Non-blocking (2): minor issues and suggestions
  • suggestion: The GraphQL guard only suppresses on lc-hash clientlibs (CS) but not MD5-hash (AMS), unlike the model.json guard which checks both. If intentional (AMS sites with GraphQL are genuinely headless), a brief comment clarifying the asymmetry would help future readers - packages/spacecat-shared-utils/src/aem.js:192
  • suggestion: The lc-hash clientlib regex is now duplicated across multiple guard blocks. Consider extracting to a named constant (e.g. const LC_HASH_CLIENTLIB_RE = /\/.../i) to reduce maintenance surface - packages/spacecat-shared-utils/src/aem.js:194

Previously flagged, now resolved

  • Double-counting fixed - patterns removed from unconditional aemHeadlessPatterns array, scoring via guarded blocks only
  • Guard-suppression tests added - 3 negative-path tests (GraphQL+lc-hash, model.json+lc-hash, model.json+MD5-hash) lock in guard behavior

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 8s | Cost: $2.99 | Commit: 19ae628484250376ec6924aaa2b40bd1920c81c6
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@MysticatBot MysticatBot added complexity:medium AI-assessed PR complexity: MEDIUM and removed complexity:high High complexity PR labels Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI complexity:medium AI-assessed PR complexity: MEDIUM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants