Skip to content

refactor(error): typed sub-enums in place of stringly-typed reason fields#23

Draft
zrosenbauer wants to merge 6 commits into
mainfrom
chore/cleanup-fixes
Draft

refactor(error): typed sub-enums in place of stringly-typed reason fields#23
zrosenbauer wants to merge 6 commits into
mainfrom
chore/cleanup-fixes

Conversation

@zrosenbauer

Copy link
Copy Markdown
Member

Summary

Replaces reason: String on three public error variants with typed sub-enums, and wires regex::Error as a proper #[source] on SchemaError::InvalidRegex. Display output is byte-identical to the prior format.

  • ParseError::MalformedTag.reasonkind: MalformedTagKind (5 variants)
  • ParseError::MalformedAttribute.reasonkind: MalformedAttrKind (4 variants)
  • SelectorError::Syntax.reasonkind: SyntaxKind (9 variants — distinct caps + structural Expected { what: &'static str })
  • SchemaError::InvalidRegex.reason: String#[source] source: regex::Error; SchemaError drops Eq (regex::Error is PartialEq but not Eq), retains PartialEq + Clone

Includes .changeset/typed-error-kinds.md (minor bump, breaking pre-1.0 per SemVer 4.0).

Why

Diagnostic prose buried in a String is opaque to programmatic consumers — they end up doing err.to_string().contains("…") to discriminate cases. With typed kinds, callers can match exhaustively on the failure mode. The #[source] wiring fixes the one place we were dropping a cause chain (regex::Error flattened via e.to_string()), so anyhow / eyre / tracing see the full chain.

Test plan

  • cargo fmt --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-features --workspace — 198 tests pass, incl. fixture_suite insta snapshots and malformed_tag_reports_useful_reason rstest (substring assertions on Display)
  • pnpm run build:debug && pnpm test in bindings/node — 33 vitest tests pass (Node error.message is unchanged because Display is preserved)
  • scripts/check-versions.sh — crate / npm version parity intact

Breaking?

Yes, for any consumer that constructed these variants directly or pattern-matched on the reason field. Consumers that only used matches!(e, ParseError::MalformedTag { .. }) or read Display output are unaffected.

zrosenbauer and others added 5 commits June 4, 2026 12:22
Three public error variants previously buried the actual failure mode
inside a `reason: String` interpolated at the call site, forcing
consumers to string-match diagnostic prose to discriminate cases:

- `ParseError::MalformedTag.reason` → `kind: MalformedTagKind`
  (ExpectedCloseAngle, UnterminatedOpenTag, ExpectedCloseSlashAngle,
   UnterminatedComment, UnterminatedCdata)
- `ParseError::MalformedAttribute.reason` → `kind: MalformedAttrKind`
  (UnexpectedNameStart, ExpectedEquals, ExpectedOpenQuote,
   UnterminatedValue)
- `SelectorError::Syntax.reason` → `kind: SyntaxKind`
  (UnionTooLarge, CompoundTooLong, TooManyPredicates,
   NotNestingTooDeep, UnsupportedPseudoClass, NthChildMustBeOneOrGreater,
   IntegerOutOfRange, ExpectedDigit, Expected{what: &'static str})

`SchemaError::InvalidRegex` now carries `#[source] source: regex::Error`
instead of stringifying the cause, so `Error::source()` walks the chain
for downstream `anyhow` / `eyre` / `tracing` consumers. `SchemaError`
drops its `Eq` derive (regex::Error is PartialEq but not Eq); PartialEq
is retained.

Display output is byte-identical to the prior format — the rstest
substring assertions and insta snapshots in fixture_suite pass
unchanged. Breaking only for consumers that constructed these variants
directly or matched on the `reason` field.

Co-Authored-By: Claude <noreply@anthropic.com>
Cover the load-bearing internals that black-box tests in tests/*.rs can't
reach directly: tokenizer's looks_like_tag_start prose-vs-tag dispatch,
the attr duplicate-tracker linear-to-hashset promotion at threshold,
CDATA trivia split bookkeeping, and parse's sibling-scoped duplicate-id
check + frame/root scope routing.

docs: add executable doctests across the public API

Convert the crate-level synopsis to a parse to select to mutate walkthrough
and add executable examples to parse, parse_owned, Selector::parse,
Markdown::update / replace_content / replace_in / replace_text / to_xml,
and validate. Lifts doctest count from 3 to 12 so docs.rs shows real
usage and CI catches signature drift at the public surface.
…ript

- examples/: new rust-simple, rust-advanced, node-simple, node-advanced
  scaffolds + samples; workspace members updated to match. Examples build
  via `cargo check --workspace`.
- bindings/node/src/lib.rs: expand design-notes header, document the
  clippy carve-outs, clarify JS regex flag handling for replace_in.
- crates/marxml/src/lib.rs: drop the stale `#![doc(html_root_url)]`
  placeholder (was pointing at 0.0.0).
- crates/marxml/src/selector/matcher.rs: explain the
  `#[allow(clippy::too_many_arguments)]` on walk() — packing the scratch
  into a struct would add a borrow indirection on the hot recursion path.
- docs/ARCHITECTURE.md: minor wording.
- scripts/reserve-npm-names.sh: deleted; one-time bootstrap, no longer
  needed now that trusted publishing is configured for every sub-package.

Co-Authored-By: Claude <noreply@anthropic.com>
… docs

Outcome of a multi-reviewer audit (6 Claude agents + 1 Codex independent
run) against Rust 1.95 / Rust API Guidelines / napi-rs v3. Internal
cleanup, hot-path microopts, doc/code parity, and one public API
tightening on the validation surface. See the changeset for the
downstream-facing summary; companion contributor doc added.

Public API (minor):
- ValidationError::InvalidAttr.reason: String -> kind: InvalidAttrKind.
  The new enum carries the allowed-enum list or the regex pattern as
  structured fields callers can match on. Display output is
  byte-identical; only consumers that destructured `reason` break.
  Re-exported at the crate root.

Node binding:
- IntoNapi extension trait centralizes crate->napi::Error mapping. Call
  sites read `.into_napi()?` instead of repeating .map_err(|e| Error::new(
  Status::InvalidArg, e.to_string())) closures. Orphan rule blocks a
  direct From impl; the trait stands in.
- toJson JSDoc no longer claims "no string round-trip" (the wrapper
  hides one inside marxml.mjs — being honest about it).
- replace_in_content rustdoc now describes the RegexBuilder setter
  mechanism (the prior (?flags:…) prefix claim was wrong about
  mechanism though correct about observable behavior).

Internal renames + cleanup:
- mutate::try_replace_content / try_replace_in -> splice_content_report
  / splice_regex_report. The try_* prefix is reserved for fallible ops;
  these always return MutationReport. Public re-exports
  (replace_content_report, replace_in_report) unchanged.
- Dropped the dead splice_regex one-line wrapper. Callers go through
  splice_regex_with directly.

Hot-path microopts (no benchmark required — straight wins):
- mutate::rewrite_open_tag allocates with capacity sized from
  tag + existing-attr + new-attr byte budgets.
- serialize::to_xml allocates with capacity = doc.raw().len().
- tokenizer::record_seen_attr builds the lazy HashSet via
  iter().map().collect() instead of with_capacity + manual loop.
- escape::decode_entities replaces .chars().next().expect("non-empty
  tail") with an unreachable! — same runtime behavior, expressed via
  the type system.

Diagnostic fix:
- ParseError::MalformedAttribute { kind: UnterminatedValue, .. } now
  reports the line at which input ran out, not the line of the
  attribute name. For attributes whose values span multiple lines and
  run off the end, the error now points at the actual EOF rather than
  far above. One snapshot updated.

Docs:
- Markdown::to_xml rustdoc now warns the output is re-serialized, not
  byte-preserved; SourcePosition values from the source don't apply.
- docs/dsl/selectors.md describes nested :not() as
  permitted-up-to-depth-64 (matches the parser); prior wording said
  rejected.
- crates/marxml/Cargo.toml gets [package.metadata.docs.rs] for future
  feature-gated items.
- New contributing/rust-best-practices.md codifies the rubric the
  review pass used: Rust idiom checklist, napi-rs v3 specifics, 2024
  edition deferrals, reviewer severity levels.

Validates: cargo fmt --all -- --check, cargo clippy --all-targets
--all-features --workspace -- -D warnings, cargo test --all-features
--workspace, pnpm test in bindings/node — all green on macOS.

Co-Authored-By: Claude <noreply@anthropic.com>
@zrosenbauer

Copy link
Copy Markdown
Member Author

Pushed 773cb8a on top — output of a multi-reviewer code audit (6 Claude agents — 3 Opus / 3 Sonnet — plus 1 Codex independent run) against Rust 1.95 / Rust API Guidelines / napi-rs v3, with every ERROR-class finding independently re-verified against the cited code before fix. One Codex finding (parser.rs:297 integer-overflow context loss) was invalidated during verification and dropped.

What landed in this commit

Public API (minor, new changeset review-fixes-2026-06.md):

  • ValidationError::InvalidAttr.reason: Stringkind: InvalidAttrKind. The new typed enum carries NotInEnum { allowed } and NoRegexMatch { pattern } as structured fields callers can match on. Display output is byte-identical (existing snapshots / log lines unchanged). Re-exported at the crate root. Only breaking for consumers that destructured reason.

Node binding:

  • IntoNapi extension trait centralizes crate-error → napi::Error mapping. Call sites now read .into_napi()? instead of repeating .map_err(|e| Error::new(Status::InvalidArg, e.to_string())). Orphan rule blocks a direct From impl; the trait stands in.
  • replace_in_content rustdoc fixed: the (?flags:…) claim was wrong about mechanism (the impl uses RegexBuilder setters). User-observable behavior unchanged.
  • toJson JSDoc no longer claims "no string round-trip" — the wrapper hides one inside marxml.mjs and honesty was worth a sentence.

Internal cleanup:

  • mutate::try_replace_content / try_replace_insplice_content_report / splice_regex_report. The try_* prefix is reserved for fallible ops; these return MutationReport. Public re-exports (replace_content_report, replace_in_report) unchanged.
  • Dropped the dead splice_regex one-line wrapper.

Hot-path microopts (no benchmark required):

  • rewrite_open_tag and to_xml now allocate with String::with_capacity(...) sized from known byte budgets (tag + attrs / raw.len()).
  • tokenizer::record_seen_attr builds the lazy HashSet via iter().map().collect().
  • escape::decode_entities replaces .chars().next().expect("non-empty tail") with let-else + unreachable!.

Diagnostic bug:

  • ParseError::MalformedAttribute { kind: UnterminatedValue, .. } now reports the line where input ran out, not the attribute-name line. Fixture snapshot updated (line 1 → line 2) — the new diagnostic is more accurate for multi-line attribute values.

Docs:

  • Markdown::to_xml rustdoc warns the output is re-serialized, not byte-preserved; stored SourcePosition values from the parsed document don't apply.
  • docs/dsl/selectors.md now describes nested :not() correctly (permitted up to depth 64; prior wording said "rejected" which contradicted the parser).
  • [package.metadata.docs.rs] added to the crate manifest for future feature gates.
  • New: contributing/rust-best-practices.md codifies the rubric reviewers used — Rust idiom checklist, napi-rs v3 specifics, 2024 edition deferrals, severity definitions. Companion to AGENTS.md's hard rules.

Validates clean

cargo fmt --all -- --check     ✓
cargo clippy --all-targets --all-features --workspace -- -D warnings  ✓
cargo test --all-features --workspace  ✓  (incl. 12 doc-tests)
pnpm test (bindings/node)      ✓  (33/33)

Deferred — Ask First items not in this commit

Tracked as follow-ups per the AGENTS.md "Ask First" rules:

  • Compile-once Selector / Schema napi classes (Node API addition).
  • Async parse / mutator variants for multi-MB inputs.
  • linux-arm64-musl napi target.
  • SelectorError::UnexpectedEnd { at: usize } field add.
  • SerializeOpts::self_close_empty() builder rename to remove field/method shadow.
  • is_name_start / is_name_char visibility decision.
  • ElementRef PartialEq / Eq derives + pointer-vs-value semantic call.
  • AttrConstraint accessor methods.
  • Edition 2024 migration + MSRV bump to 1.85.

Full per-module review reports live at /tmp/marxml-review/A-tokenizer.md through G-codex.md (not checked in — local artifacts of the audit pass).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant