Skip to content

feat: improve codebase#3

Merged
LeadcodeDev merged 38 commits into
mainfrom
feat/improve-codebase
Jun 3, 2026
Merged

feat: improve codebase#3
LeadcodeDev merged 38 commits into
mainfrom
feat/improve-codebase

Conversation

@LeadcodeDev
Copy link
Copy Markdown
Owner

No description provided.

sqlx 0.8's PgTypeInfo::with_name does not accept schema-qualified names
like "agent.canal_type_enum"; emitting them causes runtime decode errors.
Always emit the unqualified type name and rely on the connection's
search_path to resolve non-public schemas.
35-task TDD plan covering 78 findings across security, codegen/typemap,
error handling, tests/CI, and SQL ↔ Rust conformity.
Introduces codegen::identifiers with quote_ident, quote_qualified, and
is_safe_ident helpers. Foundation for preventing SQL injection in
generated CRUD code by quoting table/column/schema names per dialect
(backticks for MySQL, double quotes for Postgres/SQLite).
Every table, schema, and column name interpolated into generated SQL
strings now goes through identifiers::quote_ident / quote_qualified.
Prevents SQL injection when DB metadata contains quote characters or
reserved words, and produces correct SQL for identifiers that would
otherwise be ambiguous (e.g. columns named "select").

Updated 13 existing tests whose substring assertions encoded the prior
unquoted output. Fixed the junction_entity test fixture to split
schema/table properly instead of relying on a dotted table_name.
Each Rust type passed via --type-overrides is now parsed by
syn::parse_str::<syn::Type> before being injected into generated code.
Rejects empty keys/values, missing '=', and strings that aren't a
single valid Rust type. Closes the code-injection path where
"--type-overrides jsonb=Vec<u8>; fn pwned() {}" would have been
emitted verbatim into the output.
Connection failures previously bubbled up the raw sqlx::Error which can
include the full database URL (user:password@host) in its Display
implementation. Wrap the pool.connect() error in a new
Error::Connection variant that carries a redacted URL, and add a
redact_url helper that replaces the password with "****".
parse_and_format / parse_and_format_with_tab_spaces previously called
std::process::exit(1) when prettyplease failed to parse the generated
TokenStream. That kills the user's build with no recovery path and no
useful diagnostic if sqlx-gen is ever used as a library.

Now both helpers return Result<String, error::Error>; format_tokens,
format_tokens_with_imports, and codegen::generate propagate via ?. The
error message includes the failing token stream and a request to file
an issue, since this only fires on internal codegen bugs.

Test helpers across struct_gen, enum_gen, composite_gen, domain_gen,
crud_gen, codegen::mod, and e2e_sqlite were updated to .unwrap() the
Result — they assert on the happy path and want a clear panic if it
breaks.
Removed 7× .expect() on MySQL information_schema Vec<u8> → String
conversions, replaced with utf8_field helper that returns
Error::Config on invalid UTF-8.

Removed 5× .last_mut().unwrap() panic risks across postgres.rs and
mysql.rs (tables, views, enums, composites). Each now returns
Error::Config with an "internal sqlx-gen bug" message that points the
user at filing an issue rather than crashing the build.
write_atomic streams into a sibling NamedTempFile then renames into
place, so a Ctrl-C or disk-full error never leaves a half-written .rs
file that would break the user's next build.

validate_safe_filename rejects path separators, "..", absolute paths,
empty names, and non-.rs extensions before any write happens. Defends
against the rare case where introspected table names flow into the
output filename and could otherwise escape output_dir.
Runs on every push to main and every PR. Three primary jobs:
- test: cargo test --all (unit + sqlite-based integration)
- fmt: rustfmt --check
- clippy: -D warnings on all targets

Two optional jobs spin up Postgres 16 and MySQL 8.0 services and run
the e2e_postgres / e2e_mysql test files (added in upcoming commits).
These continue-on-error until the e2e suites exist.
…e decimal

- MySQL `bit(1)` → bool (idiomatic boolean column); bit(N>1) stays Vec<u8>
- MySQL `boolean`/`bool` aliases → bool (previously fell through to String)
- Postgres `interval` → PgInterval with the corresponding import
  (was hitting the String fallback)
- SQLite `NUMERIC`/`DECIMAL` → Decimal instead of f64; matches the
  precision-safe behaviour already shipped for Postgres and MySQL
Before this commit, a Postgres column of type my_enum[] was mapped to
Vec<MyEnum> but the generated MyEnum had no PgHasArrayType impl. At
runtime sqlx then bailed with "unsupported type _my_enum of column #N"
because it could not resolve the array's element type info.

Now both enum_gen and composite_gen emit an `impl PgHasArrayType` whose
array_type_info() returns PgTypeInfo::with_name("_<name>"), which matches
how Postgres names array types. Gated on DatabaseKind::Postgres so MySQL
and SQLite output is unchanged.
Two SQL enum values like 'foo bar' and 'foo_bar' both collapse to the
Rust identifier FooBar via to_upper_camel_case, which previously
generated code that would not compile. check_variant_collisions runs
during codegen::generate and returns a clear Error::Config pointing at
the conflicting variants and the Rust identifier they share.
Columns named "user-id", "created at", "123foo" etc. previously
produced Rust code that wouldn't compile because format_ident! cannot
encode dashes/spaces/leading digits. sanitize_rust_ident:
- replaces every non-alphanumeric (and non-_) character with '_'
- prefixes a '_' if the result starts with a digit
- falls back to "_field" on an empty string

The original DB column name is preserved via the existing
#[sqlx(rename = "<original>")] rewrite, so reads and writes still hit
the right column.
Tables in "public" (Postgres), "main" (SQLite), or "dbo" no longer get
their schema rendered into every generated SELECT/INSERT/UPDATE/DELETE.
The qualified form is still used for non-default schemas, where it is
required for unambiguous resolution.
The audit flagged inline ENUMs as potentially broken, but the existing
per-variant #[sqlx(rename)] emitted whenever the camelCase identifier
differs from the SQL value is exactly what sqlx::Type expects for text
encoding on MySQL/SQLite. These tests pin that behaviour for both
lowercase and case-sensitive variants so a future refactor can't
silently regress it.
`--domain-style alias` (default) keeps the existing `pub type Email = String;`
behaviour. `--domain-style newtype` instead emits

    #[derive(..., sqlx::Type)]
    #[sqlx(transparent)]
    pub struct Email(pub String);

so the user can attach validation, traits, or accessors to the
domain. Both styles share the same doc-comment and codegen plumbing
via the new DomainStyle enum and generate_with_domain_style entry
point. CLI defaults preserve current behaviour exactly.
SQLite has no native enum type, so users encode them with
  TEXT CHECK (status IN ('active', 'inactive'))

extract_check_enums parses sqlite_master.sql for each table, looks for
that pattern column-by-column, and synthesises an EnumInfo plus
rewrites the column's udt_name to <table>_<col>_enum. From there the
existing enum/typemap pipeline takes over and emits a real Rust enum
that round-trips via per-variant #[sqlx(rename)].
contextualize_sqlx_error inspects the SQLSTATE on a sqlx::Error and
re-raises:
- 42501 / 28000  → PermissionDenied with a hint about the DB user's
  privileges on information_schema / pg_catalog / sqlite_master
- 42P01 / 3F000 / 42S02  → SchemaNotFound with a hint about --schemas

Other sqlx::Error values still fall through to the existing
Error::Database variant, so the public API and behaviour are unchanged
for unrelated failures.
LAST_INSERT_ID() only returns a meaningful value when the table has a
single AUTO_INCREMENT primary key. For composite PKs:
- include every PK column in InsertParams so the user can supply them
- run the INSERT with the bound values
- SELECT the freshly inserted row by binding the same PK values

build_insert_method_parsed and build_insert_many_transactionally_method
both branch on pk_fields.len(); single-PK MySQL flows continue to use
LAST_INSERT_ID exactly as before. Postgres / SQLite are unaffected
because their RETURNING * already handled this case.
compile_check.rs validates that codegen output is loadable in two
modes:

1. Fast path (always on): each GeneratedFile is parsed with
   syn::parse_file. Catches malformed attributes, unclosed braces,
   invalid identifiers, and anything else that breaks at the AST
   level. Runs across Postgres, MySQL, SQLite, and the newtype-domain
   variant.

2. Deep path (gated on SQLX_GEN_COMPILE_CHECK=1): scaffolds a
   temporary downstream crate, drops the generated code into
   src/lib.rs, and runs `cargo check` with the full sqlx dependency
   tree. This is the only check that confirms the emitted derives and
   #[sqlx(...)] attributes are actually accepted by sqlx itself.
Postgres' information_schema.columns reports the schema in which a
column's user-defined type lives (e.g. "auth" for an auth.role enum
column, "pg_catalog" for builtins). Capture it on every column so the
typemap and codegen layers can disambiguate two schemas declaring a
type with the same name.

- Adds udt_schema: Option<String> to ColumnInfo
- Postgres fetch_tables / fetch_views select COALESCE(udt_schema, '')
  and unpack to None when empty
- MySQL, SQLite, and synthetic test fixtures keep it None
- ColumnInfo derives Default so future test code can use struct update
  syntax
When the same SQL name (e.g. "role") exists in two non-default
schemas, sqlx-gen now prefixes the Rust identifier with the schema's
PascalCase form: auth.role → AuthRole, billing.role → BillingRole.
The bare PascalCase ("Role") is reserved for unique names and for the
default schema even when a collision exists.

Plumbing:
- codegen::rust_type_name_for + type_name_has_cross_schema_collision
  as the single source of truth, callable from typemap and from each
  *_gen module.
- typemap::postgres exposes map_type_qualified that takes the column's
  udt_schema (added in the previous commit) so cross-schema duplicate
  lookups land on the right (schema, name) pair.
- enum_gen::generate_enum_with_schema wraps the legacy entry point
  and propagates the SchemaInfo so the emitted Rust enum carries the
  prefixed name. composite_gen and domain_gen call rust_type_name_for
  directly since they already receive SchemaInfo.
- codegen::generate now calls generate_enum_with_schema.

The SQL #[sqlx(type_name = "...")] attribute is still emitted in its
unqualified form because sqlx 0.8 doesn't accept "schema.type"; users
remain responsible for setting search_path on the connection.
When an enum or composite lives in a schema other than public, sqlx 0.8
cannot resolve its unqualified type_name unless the connection's
search_path includes that schema. To make this discoverable:

- Emit a /// doc-comment on every non-default-schema enum and composite
  spelling out the requirement with a copy-paste-ready SET search_path
  snippet
- Add codegen::required_pg_search_path(&schema_info), which returns the
  sorted, deduplicated list of schemas needed
- Make the CLI log the exact SET search_path line after introspection
  when the result references any non-default schemas
- Document the whole flow (after_connect hook + collision prefixing)
  in a new "PostgreSQL — multi-schema setup" section in README.md
…ype)

#[derive(sqlx::Type)] combined with #[sqlx(type_name = "x")] already
auto-generates `impl PgHasArrayType` pointing at `_x` in sqlx 0.8+.
The manual impl added by Task 27 collided with the derive output,
producing E0119 "conflicting implementations" in any downstream crate
that consumed the generated types.

Remove the manual block from enum_gen and composite_gen, replace the
"must emit" tests with "must NOT emit" regressions across all three
dialects, and rely on the sqlx derive for array support.
Every column, table, and schema reference was previously emitted with
unconditional dialect quotes. For lowercase ASCII names that aren't
reserved words this produced noisy SQL ("agent"."agent__connector",
"connector_id" = $1) without any added safety.

quote_ident now defers to is_safe_unquoted: an identifier is emitted
bare when it starts with a lowercase letter or underscore, contains
only ASCII lowercase / digits / underscores, and is not in a curated
~100-word SQL reserved list (sorted, binary-searched).

quote_ident_always remains for sites that genuinely need to force the
quotes. quote_qualified composes per-part.

This means agent.agent__connector instead of "agent"."agent__connector"
on the user's reported schema, while user-supplied DB names that
collide with SELECT / order / user etc. still get quoted defensively.
The two crates used to declare their version independently (0.5.5 in
both, but with sqlx-gen-macros pinned at 0.5.4 inside sqlx-gen). A
single field bump would have to happen in three places before they
matched again, and the cross-dep made silent drift easy to ship.

- Root Cargo.toml grows [workspace.package] with version, edition,
  rust-version, license, repository, keywords, categories.
- Root Cargo.toml grows [workspace.dependencies] declaring every
  dependency once, including the internal sqlx-gen-macros (now
  always = the workspace version) and every external crate.
- Each member crate inherits with `*.workspace = true`. Per-crate
  Cargo.toml shrinks to per-crate fields only (name, description,
  features, bin).
- .gitignore now excludes /docs/superpowers/ so locally generated
  audit/plan files stay out of the repo.
@LeadcodeDev LeadcodeDev self-assigned this Jun 3, 2026
@LeadcodeDev LeadcodeDev added the enhancement New feature or request label Jun 3, 2026
@LeadcodeDev LeadcodeDev merged commit 76cdf2e into main Jun 3, 2026
5 checks passed
@LeadcodeDev LeadcodeDev deleted the feat/improve-codebase branch June 3, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant