feat(utils): add isWithinSiteScope and filterBySiteScope URL helpers [LLMO-5748]#1714
Conversation
Adds two URL utility functions to spacecat-shared-utils for filtering URLs by a site's configured base path (e.g. bulk.com/uk). Handles domain-only scopes (no restriction), relative URLs, www normalization, port comparison, and trailing-slash false-positive guard. Refs: LLMO-5748 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
There was a problem hiding this comment.
Hey @dhavkuma,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Request changes - two issues to address before merge.
Complexity: MEDIUM - medium diff, single utility addition.
Changes: Adds isWithinSiteScope and filterBySiteScope URL helpers to the shared-utils package with comprehensive tests (4 files).
Must fix before merge
- [Important] Path traversal bypass in relative URL matching via
..segments -packages/spacecat-shared-utils/src/url-helpers.js:482(details inline) - [Important]
filterBySiteScopecrashes on non-array input -packages/spacecat-shared-utils/src/url-helpers.js:507(details inline)
Non-blocking (3): minor issues and suggestions
- nit: JSDoc for
isWithinSiteScopesays "relative or absolute" but should clarify that relative means absolute-path reference (leading/) -packages/spacecat-shared-utils/src/url-helpers.js:458 - suggestion: Add test for multi-level subpath (e.g.
bulk.com/uk/en) to verify nested scope matching works correctly -packages/spacecat-shared-utils/test/url-helpers.test.js - suggestion: Add test for relative URL with query string (
/uk/page?lang=en) to pin existing correct behavior against future refactors -packages/spacecat-shared-utils/test/url-helpers.test.js
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 14s | Cost: $4.99 | Commit: 9864f74885ff63d69c261ecee52c05dd6217f673
If this code review was useful, please react with 👍. Otherwise, react with 👎.
|
|
||
| const basePathWithSlash = `${basePath}/`; | ||
|
|
||
| if (!url.startsWith('http://') && !url.startsWith('https://')) { |
There was a problem hiding this comment.
issue (blocking): Relative URLs containing .. segments bypass the scope check. For example, isWithinSiteScope("/uk/../admin/secret", "bulk.com/uk") returns true because the string literally starts with /uk/, but the resolved path is /admin/secret - outside the intended scope.
Since this function goes into a shared library where future consumers might pass unresolved paths, normalize the relative path before comparison:
if (!url.startsWith('http://') && !url.startsWith('https://')) {
const normalized = new URL(url, 'https://dummy.local').pathname;
return normalized.startsWith(basePathWithSlash) || normalized === basePath;
}| * @param {string} siteBaseUrl | ||
| * @returns {string[]} | ||
| */ | ||
| function filterBySiteScope(urls, siteBaseUrl) { |
There was a problem hiding this comment.
issue (blocking): urls.filter(...) throws TypeError if urls is null, undefined, or not an array. The sibling isWithinSiteScope handles falsy url gracefully (returns false) - this function should do the same for its collection argument.
Fix:
function filterBySiteScope(urls, siteBaseUrl) {
if (!Array.isArray(urls)) return [];
return urls.filter((url) => isWithinSiteScope(url, siteBaseUrl));
}…e scope helpers - Normalize relative URLs via new URL(url, dummy).pathname before scope comparison to prevent .. path traversal bypasses - Guard filterBySiteScope against null/undefined/non-array urls input - Clarify JSDoc: relative means absolute-path reference (leading /) - Add tests: path traversal, multi-level subpath, query string, non-array guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
….com:adobe/spacecat-shared into worktree-feat+LLMO-5748-site-scope-url-utils
There was a problem hiding this comment.
Hey @dhavkuma,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Approve - prior findings addressed, implementation is clean and well-tested.
Complexity: MEDIUM - medium diff, single utility addition.
Changes: Adds isWithinSiteScope and filterBySiteScope URL scope-filtering helpers to the shared-utils package with comprehensive tests (4 files).
Note: CI checks are currently failing - resolve before merge.
Non-blocking (2): minor issues and suggestions
- nit: Explicit default port (
:443/:80) is treated as different from omitted port -parsedUrl.portreturns'443'while omitted port returns'', sohttps://bulk.com:443/uk/pagewould be rejected againstbulk.com/uk. Unlikely in practice for audit URLs but worth a note. -packages/spacecat-shared-utils/src/url-helpers.js:496 - suggestion: Consider adding a test for case-sensitive path matching (e.g.
/UK/pagevs base/uk) to explicitly document the RFC 3986-correct strict comparison as a design decision. -packages/spacecat-shared-utils/test/url-helpers.test.js
Previously flagged, now resolved
- Path traversal bypass via
..segments now blocked by URL constructor normalization filterBySiteScopenon-array crash now handled withArray.isArrayguard
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 53s | Cost: $3.68 | Commit: 05ce3a530435a5f60b02f1761582cfd631ecc35d
If this code review was useful, please react with 👍. Otherwise, react with 👎.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey @dhavkuma,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Approve - prior approval confirmed; the only new commit is an internal monorepo lockfile bump with no code changes.
Complexity: MEDIUM - medium diff, single utility addition.
Changes: Adds isWithinSiteScope and filterBySiteScope URL scope-filtering helpers to the shared-utils package with comprehensive tests (5 files).
Non-blocking (2): minor issues and suggestions from prior review
- nit: Explicit default port (
:443/:80) is treated as different from omitted port -parsedUrl.portreturns'443'while omitted port returns'', sohttps://bulk.com:443/uk/pagewould be rejected againstbulk.com/uk. Unlikely in practice for audit URLs but worth a note. -packages/spacecat-shared-utils/src/url-helpers.js:496 - suggestion: Consider adding a test for case-sensitive path matching (e.g.
/UK/pagevs base/uk) to explicitly document the RFC 3986-correct strict comparison as a design decision. -packages/spacecat-shared-utils/test/url-helpers.test.js
Previously flagged, now resolved
- Path traversal bypass via
..segments now blocked by URL constructor normalization filterBySiteScopenon-array crash now handled withArray.isArrayguard
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 8m 30s | Cost: $1.71 | Commit: e779bc87501e4b038034fe5fcfbaddc1853ec7c1
If this code review was useful, please react with 👍. Otherwise, react with 👎.
## [@adobe/spacecat-shared-utils-v1.121.0](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.120.1...@adobe/spacecat-shared-utils-v1.121.0) (2026-06-24) ### Features * **utils:** add isWithinSiteScope and filterBySiteScope URL helpers [LLMO-5748] ([#1714](#1714)) ([5511bf0](5511bf0))
|
🎉 This PR is included in version @adobe/spacecat-shared-utils-v1.121.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Summary
isWithinSiteScope(url, siteBaseUrl)andfilterBySiteScope(urls, siteBaseUrl)to@adobe/spacecat-shared-utilssrc/url-helpers.jsalongside theprependSchema/stripWWWhelpers they depend on, and are re-exported fromsrc/index.jsBehaviour
siteBaseUrlis null / emptysiteBaseUrlhas no subpath (e.g.bulk.com)siteBaseUrlhas a subpath (e.g.bulk.com/uk)/uk/page)wwwprefix/uk/prefix used to prevent/ukmatching/ukrainefalsewithout throwingTest plan
test/url-helpers.test.jscovering all branches abovetest/index.test.jsexport snapshot updatedRefs: LLMO-5748
🤖 Generated with Claude Code