fix(utils): added some patterns for aem_headless detection#1716
fix(utils): added some patterns for aem_headless detection#1716anshulk-public wants to merge 6 commits into
Conversation
Add GraphQL persisted query and Sling Model Exporter as scored headless signals with guards against hybrid CS sites (lc-hash clientlibs) and AMS SPA Editor sites (MD5-hash clientlibs). Remove publish-p URL as a headless bonus signal — asset delivery from AEM publish tier is not an indicator of headless architecture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This PR will trigger a patch release when merged. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey @anshulk-public,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Request changes - guard logic is defeated by unconditional array scoring.
Complexity: HIGH - medium diff; dependency signal.
Changes: Adds GraphQL persisted query and Sling Model Exporter as scored headless detection signals with guards against hybrid AEM CS/AMS sites (3 files).
Note: CI checks are currently failing - resolve before merge.
Must fix before merge
- [Important] Double-counting defeats guard intent - patterns scored unconditionally in array AND conditionally in guarded blocks -
packages/spacecat-shared-utils/src/aem.js:114(details inline) - [Important] Missing guard-suppression tests - no test verifies the guards actually prevent false positives -
packages/spacecat-shared-utils/test/aem.test.js:353(details inline)
Non-blocking (2): minor issues and suggestions
- nit: No test exercises the
.model.tidy.jsonregex variant ((?:\.tidy)?group is untested) -packages/spacecat-shared-utils/test/aem.test.js:345 - nit: Test name "should NOT detect AEM Headless via publish-p URL alone" suggests a publish-p guard was added, but detection fails only because a single
/content/dam/match (score=1) is below threshold (2); consider renaming to reflect the actual mechanism -packages/spacecat-shared-utils/test/aem.test.js:310
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 0s | Cost: $4.53 | Commit: fd8d4d79d266774104005e088edee30f418f1050
If this code review was useful, please react with 👍. Otherwise, react with 👎.
Remove /graphql/execute.json and /content/...model.json from the aemHeadlessPatterns loop so they do not score unconditionally. The loop scoring bypassed the guards entirely — on a hybrid page with both signals the headless score reached MIN_THRESHOLD from the loop alone before the guarded blocks were even evaluated. Bump the guarded bonuses from +1 to +2 so each signal is self-sufficient against MIN_THRESHOLD when the guard does not fire. Add three negative-path tests to lock in guard regression safety: - GraphQL + lc-hash clientlibs -> not headless - model.json + lc-hash clientlibs -> not headless - model.json + MD5-hash clientlibs -> not headless Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey @anshulk-public,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Approve - prior blocking findings fully addressed, guards are now effective and tested.
Complexity: MEDIUM - medium diff, single domain.
Changes: Moves GraphQL and Sling Model Exporter patterns from unconditional scoring to guarded blocks (+2 each), adds guard-suppression tests for hybrid CS/AMS sites (2 files).
Non-blocking (2): minor issues and suggestions
- suggestion: The GraphQL guard only suppresses on lc-hash clientlibs (CS) but not MD5-hash (AMS), unlike the model.json guard which checks both. If intentional (AMS sites with GraphQL are genuinely headless), a brief comment clarifying the asymmetry would help future readers -
packages/spacecat-shared-utils/src/aem.js:192 - suggestion: The lc-hash clientlib regex is now duplicated across multiple guard blocks. Consider extracting to a named constant (e.g.
const LC_HASH_CLIENTLIB_RE = /\/.../i) to reduce maintenance surface -packages/spacecat-shared-utils/src/aem.js:194
Previously flagged, now resolved
- Double-counting fixed - patterns removed from unconditional
aemHeadlessPatternsarray, scoring via guarded blocks only - Guard-suppression tests added - 3 negative-path tests (GraphQL+lc-hash, model.json+lc-hash, model.json+MD5-hash) lock in guard behavior
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 8s | Cost: $2.99 | Commit: 19ae628484250376ec6924aaa2b40bd1920c81c6
If this code review was useful, please react with 👍. Otherwise, react with 👎.
Summary
/graphql/execute.json→ +2 bonus), guarded against hybrid AEM CS sites that embed GraphQL endpoints while AEM still server-renders the page (detected vialc-hashclientlibs:/etc.clientlibs/foo.lc-[hash]-lc.min.css)/content/*/path.model.json→ +2 bonus), guarded against AEM CS SPA Editor sites (lc-hash clientlibs) and AEM AMS sites (MD5-hash clientlibs:/etc.clientlibs/foo.min.[32-char-hex].css)Guards
lc-hash clientlibs (
/etc.clientlibs/foo.lc-[hash]-lc.min.css) are emitted exclusively by the AEM CS rendering pipeline — unforgeable proof AEM is server-rendering the page.MD5-hash clientlibs (
/etc.clientlibs/foo.min.[32-char-hex].css) are the AMS fingerprint format — presence rules out headless for the.model.jsonsignal.Test plan
npm test -w packages/spacecat-shared-utilspasses (1180 tests)other(notaem_headless)