fix: match node:querystring on invalid-UTF-8 escapes (U+FFFD, not raw) by spokodev · Pull Request #94 · anonrig/fast-querystring

spokodev · 2026-06-24T01:05:13Z

Problem

parse() returns the raw, undecoded %XX text for syntactically valid
percent-escapes that decode to invalid UTF-8. The README positions this
module as a replacement for node:querystring ("replace the legacy
node:querystring module", "Supports both parse and stringify methods
from node:querystring"), and the test suite ships a parity test against it,
so node:querystring is the oracle. On this class of input the two diverge:

const querystring = require("node:querystring");
const fqs = require("fast-querystring");

querystring.parse("a=%C3"); // { a: '�' }  (replacement character)
fqs.parse("a=%C3");         // { a: '%C3' }      (raw, undecoded)  <-- bug

input	node:querystring	fast-querystring (before)
`a=%C3`	`{a:"�"}`	`{a:"%C3"}`
`a=%ff`	`{a:"�"}`	`{a:"%ff"}`
`a=%E4%B8`	`{a:"�"}`	`{a:"%E4%B8"}`
`a=%C0%80`	`{a:"��"}`	`{a:"%C0%80"}`

It affects keys as well as values, since both use the same decode path.
Valid multibyte (a=%E4%B8%AD -> 中) and non-hex escapes (a=%zx -> %zx)
already match node:querystring; only the invalid-UTF-8 class diverges.

Cause

lib/parse.js decodes via fast-decode-uri-component, which returns null
for invalid UTF-8. The fastDecode(x) || x fallback then keeps the raw %XX
text, whereas node:querystring substitutes U+FFFD via its lenient decoder.

Fix

A small fallback decoder (lib/internals/decode.js) reproduces what
node:querystring does on these tokens: when at least one decodable %XX
escape is present, each code unit is masked to a single byte and run through a
non-fatal UTF-8 decode (so invalid sequences become U+FFFD); when only
malformed escapes are present (e.g. %zz), the token is returned unchanged,
matching node:querystring.

The fallback runs only when fastDecode returns null (the rare invalid
case). On valid input fastDecode returns a string and the new code path is
never entered, so the fast path is unchanged.

Verification

Full suite green, plus added parity tests asserting parse() equals
node:querystring.parse() for the invalid-UTF-8 class in keys and values.
Differential of parse() vs node:querystring.parse() over 500k random
query strings: every previously-passing case still matches, and the entire
invalid-UTF-8 class now matches (0 cases that this change makes worse).
Over 1,000,000 ASCII-input query strings (the domain real query strings come
from, since non-ASCII is always percent-encoded) parse() now matches
node:querystring.parse() exactly.
Verified on Node 24 and Node 26.

parse() returned the raw, undecoded "%XX" text for syntactically valid percent-escapes that decode to invalid UTF-8 (e.g. parse("a=%C3") gave {a:"%C3"}). node:querystring, the module README positions this as a replacement for, substitutes U+FFFD instead ({a:"�"}). This affected both keys and values. fast-decode-uri-component returns null on invalid UTF-8, and the `fastDecode(x) || x` fallback kept the raw escape. Add a lenient fallback decoder that reproduces node:querystring exactly: when a decodable %XX escape is present, mask each code unit to one byte and run it through a non-fatal UTF-8 decode (yielding U+FFFD on bad sequences); when only malformed escapes are present, return the token unchanged. The fallback runs only when fastDecode returns null, so the valid-input fast path is unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: match node:querystring on invalid-UTF-8 escapes (U+FFFD, not raw)#94

fix: match node:querystring on invalid-UTF-8 escapes (U+FFFD, not raw)#94
spokodev wants to merge 1 commit into
anonrig:mainfrom
spokodev:fix/parse-invalid-utf8-replacement

spokodev commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

spokodev commented Jun 24, 2026

Problem

Cause

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant