Skip to content

Validate downloaded theme assets against extension and MIME allowlists#852

Merged
mikachan merged 39 commits into
trunkfrom
fix/validate-downloaded-theme-assets
Jun 19, 2026
Merged

Validate downloaded theme assets against extension and MIME allowlists#852
mikachan merged 39 commits into
trunkfrom
fix/validate-downloaded-theme-assets

Conversation

@mikachan

@mikachan mikachan commented Jun 16, 2026

Copy link
Copy Markdown
Member

Add a per-sink extension allowlist and post-download MIME validation to the four places the plugin calls download_url() and writes the response to disk: media downloads from template/pattern content (add_media_to_local), font downloads from user global styles (copy_font_assets_to_theme), and the zip-export equivalents of both (add_media_to_zip, add_activated_fonts_to_zip). Also rejects multi-extension polyglot URLs (evil.php.jpg).

To test, ensure the following tests pass:

  npm run test:unit:php:base -- --filter "Test_Create_Block_Theme_(Media|Fonts|Zip)"

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Create Block Theme’s asset-downloading code paths by adding pre-download URL extension allowlists and post-download content-type validation for media and font assets, including the zip-export flows, to reduce the risk of downloading/extracting unsafe payloads.

Changes:

  • Add URL allowlist validators to reject disallowed and multi-extension polyglot URLs before calling download_url().
  • Add post-download validation: MIME allowlist for media and magic-byte validation for font files.
  • Add PHPUnit coverage for the new validators and for “skip without downloading” behavior; update contributor test commands in docs.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
includes/create-theme/theme-media.php Adds media URL allowlist + post-download MIME allowlist; wires checks into local media download flow.
includes/create-theme/theme-fonts.php Adds font URL allowlist + magic-byte validation; wires checks into font asset copying.
includes/create-theme/theme-zip.php Wires the same validations into zip export for media and activated fonts.
tests/test-theme-media.php Adds tests for media URL/file validation and for short-circuiting before HTTP.
tests/test-theme-fonts.php Adds tests for font URL/file validation and for short-circuiting before HTTP.
tests/test-theme-zip.php Adds zip-specific test ensuring disallowed media URLs don’t trigger HTTP.
tests/data/evil.php.txt Adds malicious PHP fixture used by validator tests.
AGENTS.md Updates documented npm commands for running PHP unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread includes/create-theme/theme-zip.php
Comment thread includes/create-theme/theme-zip.php
Comment thread includes/create-theme/theme-zip.php Outdated
Comment thread includes/create-theme/theme-media.php
Comment thread includes/create-theme/theme-media.php
Comment thread includes/create-theme/theme-fonts.php
Comment thread includes/create-theme/theme-zip.php Outdated
Comment thread includes/create-theme/theme-media.php Outdated
mikachan and others added 12 commits June 16, 2026 14:01
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Two recent Copilot suggestion commits accidentally removed structural
code while making single-line changes:
- c045da7 ("Pass $font_src to download_url") dropped the `} else {`
  and the `is_wp_error( $tmp_file )` guard.
- 617f6ec ("Stream downloaded media into zip") dropped the closing
  brace of the `foreach` in add_media_to_zip().

Both broke PHP parse on every CI matrix. Restoring the missing
statements so the file is valid again.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Two more Copilot suggestion commits dropped surrounding context lines:
- ad31fe4 ("Strip query string from filename before renaming downloaded
  media") removed the closing brace of the foreach loop in
  add_media_to_local(), causing fatal lint errors.
- 860f1bc ("Strip query string from font URL before deriving filename
  in zip") removed the `$font_family_dir_name = sanitize_title(...)`
  assignment, leaving an undefined variable used immediately below.
`$font_filename = basename( $font_face['src'] )` threw TypeError on
PHP 8+ when src was already an array (valid per Theme JSON spec) and
was redefined inside the inner src loop anyway. The adjacent
`$font_dir = wp_get_font_dir()` was likewise redundant — also
redefined inside the inner loop. Removing both.
wp_check_filetype_and_ext() was inconsistent with the URL allowlist:
SVG isn't in WP Core's default mime registry at all, and WP maps wmv
to video/x-ms-wmv / avi to video/avi while the post-check allowlist
had only video/x-msvideo. Legitimate SVG/WMV/AVI URLs passed preflight
and then silently failed post-download.

Switching media to magic-byte verification (matching the font sink)
removes the dependency on WP Core's mime registry and libmagic
versions. Recognised formats: JPEG, PNG, GIF, WebP, SVG, MP4/M4V/MOV/
3GP/3G2 (via ftyp), WebM, OGV, WMV (ASF), AVI (RIFF), MPEG.
@scruffian

Copy link
Copy Markdown
Contributor

I found 3 issues:

  1. The temp file is now deleted before ZipArchive has read it.
  2. URL rewriting still uses the raw URL basename, so query-string slashes can produce broken or dangerous-looking asset references even though the download sink now uses the parsed path. For example http://example.com/cat.jpg?/evil.php now becomes cat.jpg for validation/download, but exported template markup points at /assets/images/evil.php.
  3. is_allowed_media_file() does not verify that the downloaded bytes match the URL extension. e.g. a .jpg URL with SVG/MP4/WebM bytes passes and gets saved as .jpg.
    '
    I added unit tests to demonstrate the issues (but not fixes)

mikachan added 2 commits June 16, 2026 17:30
Three fixes for issues found via regression tests in 8da66ff:

1. is_allowed_media_file() now requires the downloaded bytes to match
   the URL extension specifically — a .jpg URL with SVG/MP4/anything-else
   bytes is rejected. Previously any allowed magic was accepted, so a
   browser MIME-sniffing the on-disk .jpg containing SVG could render
   it as SVG (with all the script-execution surface that brings).

2. make_relative_media_url() now derives the filename from the parsed
   URL path instead of raw basename(). A URL like cat.jpg?/evil.php
   previously produced `/assets/images/evil.php` in exported template
   markup because basename() treats query-string slashes as path
   separators.

3. add_media_to_zip() reads the downloaded bytes into memory and
   unlinks the tmp file BEFORE adding to the archive via
   addFromStringToTheme(). The previous `addFileToTheme() + unlink`
   sequence broke because ZipArchive defers reading files added via
   addFile() until close() — by which time the tmp file was gone.
@mikachan

Copy link
Copy Markdown
Member Author

Thanks! These should all be fixed now.

  1. add_media_to_zip() now reads the downloaded bytes into memory and unlinks the tmp file before adding via addFromStringToTheme().
  2. make_relative_media_url() now uses basename(wp_parse_url($url, PHP_URL_PATH)) instead of raw basename($url). The example cat.jpg?/evil.php now produces /assets/images/cat.jpg as expected.
  3. is_allowed_media_file() now switches on the URL extension and requires the bytes to match that specific format. A .jpg URL with SVG/MP4/anything-else bytes is rejected so we never persist a file whose extension lies about its content.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Comment thread tests/test-theme-zip.php
Comment thread includes/create-theme/theme-zip.php
Comment thread includes/create-theme/theme-media.php
Comment thread includes/create-theme/theme-fonts.php Outdated
mikachan and others added 3 commits June 17, 2026 13:39
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
parse_url() omits the 'port' key when the URL has no explicit port
(e.g. `http://localhost/foo`). The retry-on-localhost block in both
add_media_to_zip() and add_media_to_local() read $parsed_url['port']
unconditionally, raising an undefined-index notice when the initial
download failed for such a URL. Guard with isset() on both keys.
Previously any recognised font magic was accepted, so a .woff2 URL
returning TTF/WOFF/OTF/EOT bytes would pass and be saved under the
.woff2 filename. This defeated the extension+MIME allowlist intent
and would produce broken assets in the exported theme/zip (browsers
would fail to load a WOFF2-named file containing TTF content).

The check now switches on the URL extension and validates the magic
matches THAT specific format. Mirrors the same fix already applied
to is_allowed_media_file().
@scruffian

Copy link
Copy Markdown
Contributor

is_allowed_media_file has an mpeg branch, but is_allowed_media_url and the folder mapping only allow mpg. A template containing a .mpeg video would get rewritten to a local asset URL, then skipped before download, leaving the exported theme pointing at a missing file.

Comment thread includes/create-theme/theme-zip.php Outdated
is_allowed_media_file() accepted both .mpg and .mpeg, but
is_allowed_media_url() and get_media_folder_path_from_url() only
listed .mpg. A template with a .mpeg video would get rewritten to a
local asset URL, then skipped before download — leaving the exported
theme pointing at a missing file.

Add .mpeg to the URL allowlist and the folder mapping so all three
lists agree (matching WP Core's wp_get_mime_types which maps
mpeg|mpg|mpe to video/mpeg).

Also drops a pre-existing duplicate 'ogv' entry in the folder mapping
that was noticed while editing.
@mikachan

Copy link
Copy Markdown
Member Author

is_allowed_media_file has an mpeg branch, but is_allowed_media_url and the folder mapping only allow mpg.

Good spot, thank you. This should be fixed in 375663c.

mikachan and others added 5 commits June 17, 2026 16:20
Same fix as add_media_to_zip: ZipArchive::addFile() defers reading
the file until close() is called, so unlinking the download_url tmp
file immediately after addFileToTheme() left an empty entry in the
final archive when the zip was finalised.

Read the bytes into memory, unlink, then addFromStringToTheme() —
mirrors the pattern in add_media_to_zip(). Only the remote-download
branch is affected; the local-copy branch points at a permanent
file in the font library so its addFileToTheme() call stays.
AVIF is a WordPress-supported image format but was missing from the
URL allowlist, folder mapping, and magic-byte check. The result:
make_template_images_local() rewrote .avif URLs to local references,
but the post-rewrite download was rejected — leaving exported themes
pointing at missing assets.

AVIF uses the ISO BMFF container (like MP4) with 'avif' or 'avis' as
the major brand at offset 8. Added the brand-specific magic check
alongside the URL/folder allowlist entries, plus tests for both AVIF
still-image ('avif') and image-sequence ('avis') brands.
Previously a font source that failed the URL allowlist or post-download
MIME check was skipped via continue, but the original $font_src stayed
in $font_face['src']. copy_activated_fonts_to_theme() then merged that
family into theme.json — leaving the user's activated font pointing at
an uncopied (and possibly disallowed) source while the user custom
settings were cleared.

Both copy_font_assets_to_theme() and add_activated_fonts_to_zip() now
build a fresh srcs list and only retain entries that successfully
copied or were already file:./ asset paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread includes/create-theme/theme-media.php Outdated
Some AVIF files (typically those emitted by HEIF-derived tooling) set
the major brand to mif1/miaf and only list avif/avis in the compatible
brands. Parsing only the major brand at offset 8 would false-reject
them.

Scan the full FileTypeBox brand list (major brand plus compatible
brands at offsets 16+) for any AVIF brand. HEIC files (no avif/avis
anywhere) are still rejected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread includes/create-theme/theme-zip.php
Comment thread includes/create-theme/theme-media.php Outdated
mikachan and others added 2 commits June 19, 2026 14:17
Both copy_font_assets_to_theme() and add_activated_fonts_to_zip()
treated a font src under the WP user-fonts directory as trusted —
copying / zipping the bytes without ever inspecting them. Anything
that ended up in that directory through a separate flow (a polyglot
upload via another plugin, manual write, etc.) would be promoted to
the exported theme as-is just because the URL extension was on the
allowlist.

Run the same is_allowed_font_file() check on the local source before
copying / zipping. If the bytes don't match the URL extension's font
format, drop the source from the returned families.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plain str_replace on the whole template would substring-match a
shorter validated URL inside a longer rejected URL when one is a
prefix of the other (e.g. `photo.png` inside `photo.png.php`).
That silently rewrote the rejected URL to a missing local asset and
bypassed the validated-media guard.

Walk the parsed block tree instead: WP_HTML_Tag_Processor handles img
/ video src+poster and inline `background-image:url(...)`, while
direct array access handles block-comment JSON attrs. Replacements
only touch values that exactly match a validated URL.

The rewrites embed literal PHP open/close tags that must end up
verbatim in the export, but both set_attribute() and wp_json_encode()
(inside serialize_blocks) would escape `<` / `>` / quotes. Route the
rewrites through opaque, URL-shaped placeholders so they survive
both passes, then strtr() them back to the raw PHP at the end. The
placeholder is a root-relative path because set_attribute prefixes
schemeless values with `http://`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@scruffian scruffian left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢
Thank you!

@mikachan mikachan merged commit da693bf into trunk Jun 19, 2026
13 checks passed
@mikachan mikachan deleted the fix/validate-downloaded-theme-assets branch June 19, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants