Skip to content

feat(utils): add isWithinSiteScope and filterBySiteScope URL helpers [LLMO-5748]#1714

Merged
dhavkuma merged 8 commits into
mainfrom
worktree-feat+LLMO-5748-site-scope-url-utils
Jun 24, 2026
Merged

feat(utils): add isWithinSiteScope and filterBySiteScope URL helpers [LLMO-5748]#1714
dhavkuma merged 8 commits into
mainfrom
worktree-feat+LLMO-5748-site-scope-url-utils

Conversation

@dhavkuma

Copy link
Copy Markdown
Contributor

Summary

  • Adds isWithinSiteScope(url, siteBaseUrl) and filterBySiteScope(urls, siteBaseUrl) to @adobe/spacecat-shared-utils
  • Both functions live in src/url-helpers.js alongside the prependSchema / stripWWW helpers they depend on, and are re-exported from src/index.js
  • Motivated by spacecat-audit-worker#2724, which introduced a local copy of this logic for the prerender audit — parking it here makes it reusable by both audit-worker and api-service

Behaviour

Input Result
siteBaseUrl is null / empty all URLs pass through (no restriction)
siteBaseUrl has no subpath (e.g. bulk.com) all URLs pass through
siteBaseUrl has a subpath (e.g. bulk.com/uk) only URLs within that subpath are kept
Relative URL (e.g. /uk/page) compared against the subpath directly
www prefix stripped before host comparison
Port mismatch URL is out of scope
Trailing-slash guard /uk/ prefix used to prevent /uk matching /ukraine
Malformed URL returns false without throwing

Test plan

  • 18 new unit tests in test/url-helpers.test.js covering all branches above
  • test/index.test.js export snapshot updated
  • 1180 tests passing, 0 failing
  • 100% line/statement coverage maintained
  • ESLint clean

Refs: LLMO-5748

🤖 Generated with Claude Code

dhavkuma and others added 2 commits June 24, 2026 01:35
Adds two URL utility functions to spacecat-shared-utils for filtering
URLs by a site's configured base path (e.g. bulk.com/uk). Handles
domain-only scopes (no restriction), relative URLs, www normalization,
port comparison, and trailing-slash false-positive guard.

Refs: LLMO-5748

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

This PR will trigger a minor release when merged.

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dhavkuma,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Request changes - two issues to address before merge.
Complexity: MEDIUM - medium diff, single utility addition.
Changes: Adds isWithinSiteScope and filterBySiteScope URL helpers to the shared-utils package with comprehensive tests (4 files).

Must fix before merge

  1. [Important] Path traversal bypass in relative URL matching via .. segments - packages/spacecat-shared-utils/src/url-helpers.js:482 (details inline)
  2. [Important] filterBySiteScope crashes on non-array input - packages/spacecat-shared-utils/src/url-helpers.js:507 (details inline)
Non-blocking (3): minor issues and suggestions
  • nit: JSDoc for isWithinSiteScope says "relative or absolute" but should clarify that relative means absolute-path reference (leading /) - packages/spacecat-shared-utils/src/url-helpers.js:458
  • suggestion: Add test for multi-level subpath (e.g. bulk.com/uk/en) to verify nested scope matching works correctly - packages/spacecat-shared-utils/test/url-helpers.test.js
  • suggestion: Add test for relative URL with query string (/uk/page?lang=en) to pin existing correct behavior against future refactors - packages/spacecat-shared-utils/test/url-helpers.test.js

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 14s | Cost: $4.99 | Commit: 9864f74885ff63d69c261ecee52c05dd6217f673
If this code review was useful, please react with 👍. Otherwise, react with 👎.


const basePathWithSlash = `${basePath}/`;

if (!url.startsWith('http://') && !url.startsWith('https://')) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Relative URLs containing .. segments bypass the scope check. For example, isWithinSiteScope("/uk/../admin/secret", "bulk.com/uk") returns true because the string literally starts with /uk/, but the resolved path is /admin/secret - outside the intended scope.

Since this function goes into a shared library where future consumers might pass unresolved paths, normalize the relative path before comparison:

if (!url.startsWith('http://') && !url.startsWith('https://')) {
  const normalized = new URL(url, 'https://dummy.local').pathname;
  return normalized.startsWith(basePathWithSlash) || normalized === basePath;
}

* @param {string} siteBaseUrl
* @returns {string[]}
*/
function filterBySiteScope(urls, siteBaseUrl) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): urls.filter(...) throws TypeError if urls is null, undefined, or not an array. The sibling isWithinSiteScope handles falsy url gracefully (returns false) - this function should do the same for its collection argument.

Fix:

function filterBySiteScope(urls, siteBaseUrl) {
  if (!Array.isArray(urls)) return [];
  return urls.filter((url) => isWithinSiteScope(url, siteBaseUrl));
}

@MysticatBot MysticatBot added ai-reviewed Reviewed by AI complexity:medium AI-assessed PR complexity: MEDIUM labels Jun 23, 2026
dhavkuma and others added 3 commits June 24, 2026 10:44
…e scope helpers

- Normalize relative URLs via new URL(url, dummy).pathname before scope
  comparison to prevent .. path traversal bypasses
- Guard filterBySiteScope against null/undefined/non-array urls input
- Clarify JSDoc: relative means absolute-path reference (leading /)
- Add tests: path traversal, multi-level subpath, query string, non-array guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
….com:adobe/spacecat-shared into worktree-feat+LLMO-5748-site-scope-url-utils
@dhavkuma dhavkuma requested a review from MysticatBot June 24, 2026 05:15

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dhavkuma,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Approve - prior findings addressed, implementation is clean and well-tested.
Complexity: MEDIUM - medium diff, single utility addition.
Changes: Adds isWithinSiteScope and filterBySiteScope URL scope-filtering helpers to the shared-utils package with comprehensive tests (4 files).
Note: CI checks are currently failing - resolve before merge.

Non-blocking (2): minor issues and suggestions
  • nit: Explicit default port (:443/:80) is treated as different from omitted port - parsedUrl.port returns '443' while omitted port returns '', so https://bulk.com:443/uk/page would be rejected against bulk.com/uk. Unlikely in practice for audit URLs but worth a note. - packages/spacecat-shared-utils/src/url-helpers.js:496
  • suggestion: Consider adding a test for case-sensitive path matching (e.g. /UK/page vs base /uk) to explicitly document the RFC 3986-correct strict comparison as a design decision. - packages/spacecat-shared-utils/test/url-helpers.test.js

Previously flagged, now resolved

  • Path traversal bypass via .. segments now blocked by URL constructor normalization
  • filterBySiteScope non-array crash now handled with Array.isArray guard

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 53s | Cost: $3.68 | Commit: 05ce3a530435a5f60b02f1761582cfd631ecc35d
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@dhavkuma dhavkuma enabled auto-merge (squash) June 24, 2026 18:11
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dhavkuma dhavkuma requested a review from MysticatBot June 24, 2026 18:34

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dhavkuma,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Approve - prior approval confirmed; the only new commit is an internal monorepo lockfile bump with no code changes.
Complexity: MEDIUM - medium diff, single utility addition.
Changes: Adds isWithinSiteScope and filterBySiteScope URL scope-filtering helpers to the shared-utils package with comprehensive tests (5 files).

Non-blocking (2): minor issues and suggestions from prior review
  • nit: Explicit default port (:443/:80) is treated as different from omitted port - parsedUrl.port returns '443' while omitted port returns '', so https://bulk.com:443/uk/page would be rejected against bulk.com/uk. Unlikely in practice for audit URLs but worth a note. - packages/spacecat-shared-utils/src/url-helpers.js:496
  • suggestion: Consider adding a test for case-sensitive path matching (e.g. /UK/page vs base /uk) to explicitly document the RFC 3986-correct strict comparison as a design decision. - packages/spacecat-shared-utils/test/url-helpers.test.js

Previously flagged, now resolved

  • Path traversal bypass via .. segments now blocked by URL constructor normalization
  • filterBySiteScope non-array crash now handled with Array.isArray guard

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 8m 30s | Cost: $1.71 | Commit: e779bc87501e4b038034fe5fcfbaddc1853ec7c1
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@dhavkuma dhavkuma merged commit 5511bf0 into main Jun 24, 2026
5 checks passed
@dhavkuma dhavkuma deleted the worktree-feat+LLMO-5748-site-scope-url-utils branch June 24, 2026 18:42
solaris007 pushed a commit that referenced this pull request Jun 24, 2026
## [@adobe/spacecat-shared-utils-v1.121.0](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.120.1...@adobe/spacecat-shared-utils-v1.121.0) (2026-06-24)

### Features

* **utils:** add isWithinSiteScope and filterBySiteScope URL helpers [LLMO-5748] ([#1714](#1714)) ([5511bf0](5511bf0))
@solaris007

Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/spacecat-shared-utils-v1.121.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI complexity:medium AI-assessed PR complexity: MEDIUM released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants