diff --git a/README.md b/README.md index 720b70d..7f06084 100644 --- a/README.md +++ b/README.md @@ -175,7 +175,7 @@ sqlrite> DELETE FROM users WHERE age < 30; | `CREATE TABLE` | `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; `IF NOT EXISTS` (idempotent re-create); duplicate-column detection; types `INTEGER`/`INT`/`BIGINT`/`SMALLINT`, `TEXT`/`VARCHAR`, `REAL`/`FLOAT`/`DOUBLE`/`DECIMAL`, `BOOLEAN`. Auto-creates `sqlrite_autoindex__` for every PK + UNIQUE column | | `CREATE [UNIQUE] INDEX` | Single-column, named indexes; `IF NOT EXISTS`; persists as a dedicated cell-based B-Tree. INTEGER + TEXT columns only | | `INSERT INTO` | Explicit column list required; auto-ROWID for `INTEGER PRIMARY KEY`; multi-row `VALUES (…), (…)`; UNIQUE enforcement; clean type errors (no panics); NULL padding for omitted columns | -| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` | +| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, t.col …]` (qualified keys allowed); `HAVING`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references — aggregates / `GROUP BY` / `DISTINCT` / `HAVING` all compose over join results; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` | | `UPDATE` | Multi-column `SET`; `WHERE`; UNIQUE + type enforcement; arithmetic in assignments (`SET age = age + 1`) | | `DELETE` | `WHERE` predicate or full-table delete | | `BEGIN` / `COMMIT` / `ROLLBACK` | Real transactions, snapshot-based; WAL-backed commit; single-level (no savepoints); auto-rollback if `COMMIT`'s disk write fails | @@ -193,7 +193,7 @@ Expressions in `WHERE` and `UPDATE`'s `SET` RHS: - String concat — `||` - Literals — integer + real numbers, `'single-quoted strings'`, `TRUE` / `FALSE`, `NULL`; parentheses for grouping -**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`), aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference. +**Not yet supported** (common ones): subqueries, CTEs, `HAVING` without `GROUP BY`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`). The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference. #### Meta commands diff --git a/docs/architecture.md b/docs/architecture.md index 85771ef..8ad5048 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -139,7 +139,6 @@ Steps 1–7 are purely in-memory; step 8 is the only disk contact, and after the The roadmap has shipped far enough that the original "deliberately missing" list mostly turned into shipped features. What's still left: - **No query optimizer** beyond the bounded-heap top-k pass for KNN (Phase 7c) and the HNSW probe shortcut (7d.2). Equality-on-PK probes are direct; everything else is a table scan. Joins use plain nested-loop (O(N×M) per join level); hash / merge joins on equi-join shapes are a future increment. -- **Aggregates / GROUP BY / DISTINCT over joined results.** The single-table aggregator is wired against one rowid stream; the multi-table join executor produces joined rows but doesn't yet feed them through the aggregator. Surfaces as a clean `NotImplemented` at parse time. The single-table aggregation path (SQLR-3) is fully shipped. - **No network layer.** SQLRite is embedded-only. The closest thing is the [`sqlrite-mcp`](mcp.md) server, which is stdio (not network). A real wire protocol isn't on the roadmap. - **No streaming row cursor.** `Rows` is currently backed by an eager `Vec` (Phase 5a). The `Rows::next` API is shaped to support a real cursor — the swap is deferred to **5a.2**. diff --git a/docs/roadmap.md b/docs/roadmap.md index 1e92747..fe8bf49 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -561,7 +561,7 @@ The biggest single SQL-surface jump in the project's history. - Self-joins require an alias on at least one side. - `WHERE` runs after joins (the standard `LEFT JOIN ... WHERE right.col IS NULL` anti-join idiom works). -`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported. Not yet supported: comma-separated FROMs (`FROM a, b`), aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization. +`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported, and aggregates / `GROUP BY` / `DISTINCT` / `HAVING` compose over join results (SQLR-6). Not yet supported: comma-separated FROMs (`FROM a, b`), `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization. ### ✅ Phase 9g — Prepared statements + parameter binding *(v0.9.0, SQLR-23)* @@ -736,7 +736,7 @@ The remaining items — actually open, not retroactively rewritten: - Subqueries (scalar, `IN (SELECT ...)`, correlated) and CTEs (`WITH`, recursive) - ~~`HAVING` (post-aggregation filter)~~ ✅ Shipped (SQLR-52) — group-row filter after aggregation; references GROUP BY keys, aggregate aliases, and direct aggregate calls (hidden-slot computation for HAVING-only aggregates). `HAVING` without `GROUP BY` stays rejected in v0. - `CASE WHEN … THEN … END`, `BETWEEN`, `GLOB`, `REGEXP`, `LIKE … ESCAPE ''` -- Aggregates / `GROUP BY` / `DISTINCT` *over* joins (needs a single executor pass that knows about multiple input streams) +- ~~Aggregates / `GROUP BY` / `DISTINCT` *over* joins~~ ✅ Shipped (SQLR-6) — the joined row stream feeds the same scope-generic aggregation pipeline the single-table path uses; `GROUP BY` keys accept `t.col` qualifiers; `HAVING` and `SELECT DISTINCT` compose too. - Multi-column / expression `ORDER BY`, `OFFSET`, `NULLS FIRST/LAST` - `UNION` / `INTERSECT` / `EXCEPT`, `INSERT ... SELECT` - Composite + expression indexes diff --git a/docs/sql-engine.md b/docs/sql-engine.md index 6b30299..b2b6499 100644 --- a/docs/sql-engine.md +++ b/docs/sql-engine.md @@ -49,11 +49,11 @@ The `sqlparser` AST is designed to cover every SQL dialect, so its types are hug `UPDATE` and `DELETE` don't have a dedicated internal struct — the executor pattern-matches the sqlparser types directly because there's less transformation needed. -`SelectQuery::projection` is now `Projection::All | Projection::Items(Vec)`, where each item carries a `ProjectionKind::Column { qualifier, name }` (qualifier is `Some` for `t.col` shapes, used by JOIN execution to disambiguate) or `ProjectionKind::Aggregate(AggregateCall)` plus an optional `AS alias`. `AggregateCall` covers `COUNT(*)`, `COUNT([DISTINCT] col)`, `SUM` / `AVG` / `MIN` / `MAX` of a bare column. `group_by` is a `Vec` of bare column names (empty = no GROUP BY); the parser validates that every non-aggregate projection item appears in `GROUP BY`. +`SelectQuery::projection` is now `Projection::All | Projection::Items(Vec)`, where each item carries a `ProjectionKind::Column { qualifier, name }` (qualifier is `Some` for `t.col` shapes, used by JOIN execution to disambiguate) or `ProjectionKind::Aggregate(AggregateCall)` plus an optional `AS alias`. `AggregateCall` covers `COUNT(*)`, `COUNT([DISTINCT] col)`, `SUM` / `AVG` / `MIN` / `MAX` of a column reference (optionally qualified, `SUM(o.amount)`). `group_by` is a `Vec` of optionally-qualified column references (`GROUP BY dept`, `GROUP BY customers.name`; empty = no GROUP BY). The parser validates that every non-aggregate projection item appears in `GROUP BY` for single-table queries; joined queries defer that check to the executor, which resolves qualifiers against the in-scope table schemas (SQLR-6). `SelectQuery::joins` (SQLR-5) is a `Vec` evaluated left-to-right by `execute_select_rows_joined`. Each clause carries a `JoinType` (`Inner` / `LeftOuter` / `RightOuter` / `FullOuter`), the right-table name + optional alias, and a required `ON` expression. Empty = single-table SELECT, the existing fast path with HNSW / FTS / bounded-heap optimizations. -Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).) +Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), `HAVING` without `GROUP BY`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).) ## Statement dispatch @@ -176,6 +176,14 @@ contributes to its group), so the executor takes a separate path: against the *output* row by alias, bare column name, or aggregate display form), then LIMIT. +SQLR-6 made steps 2–6 scope-generic: the accumulator consumes +`RowScope`s instead of `(table, rowid)` pairs, so a joined SELECT feeds +its fully-joined, WHERE-filtered rows (each one a `JoinedScope`) +through the exact same pipeline. `GROUP BY` keys and aggregate args +carry an optional `t.` qualifier for disambiguation; NULL-padded +outer-join rows group under a `NULL` key and are skipped by +`COUNT(col)` like any other NULL. + Aggregate function names (`COUNT`/`SUM`/`AVG`/`MIN`/`MAX`) used in WHERE or any other scalar position get a friendly error redirecting the user to the projection list (`HAVING` is where post-aggregate filters go). diff --git a/docs/supported-sql.md b/docs/supported-sql.md index b659f68..60c0ef5 100644 --- a/docs/supported-sql.md +++ b/docs/supported-sql.md @@ -204,7 +204,7 @@ COUNT([DISTINCT] ) -- counts non-NULL values, option - **Projection**: `*` (all columns in declaration order), a bare column list, or an explicit list mixing bare columns and aggregate calls. Each item can carry an optional `AS alias` (the alias becomes the output column header and is recognized by `ORDER BY`). - **`WHERE`**: any [expression](#expressions). Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). Includes **`IS NULL`** / **`IS NOT NULL`** for explicit null tests, **`LIKE` / `NOT LIKE` / `ILIKE`** for pattern matching, and **`IN (list) / NOT IN (list)`** for set-membership against literal lists. - **`DISTINCT`**: `SELECT DISTINCT` deduplicates result rows after projection (and after aggregation, when both apply). `NULL` values compare equal to other `NULL`s for dedupe, matching SQL's DISTINCT semantic. -- **`GROUP BY`**: one or more bare column names. Every non-aggregate item in the projection must appear in the `GROUP BY` list (the parser rejects the violation with a clear message). `GROUP BY ` without any aggregate behaves like an implicit `DISTINCT `. +- **`GROUP BY`**: one or more column names, optionally qualified (`GROUP BY customers.name`) — the qualifier disambiguates same-named columns across joined tables (SQLR-6). Every non-aggregate item in the projection must appear in the `GROUP BY` list (rejected with a clear message — by the parser for single-table queries, by the executor for joined ones, where resolving qualifiers needs the schemas). `GROUP BY ` without any aggregate behaves like an implicit `DISTINCT `. - **`HAVING`** (SQLR-52): post-aggregation filter over the grouped output. `WHERE` filters rows before grouping; `HAVING` filters groups after aggregation. Requires `GROUP BY` (see [HAVING semantics](#having-semantics-sqlr-52)). - **Aggregates** (SQLR-3): `COUNT(*)`, `COUNT(col)`, `COUNT(DISTINCT col)`, `SUM(col)`, `AVG(col)`, `MIN(col)`, `MAX(col)`. `SUM` over an integer column stays `INTEGER` until a `REAL` input arrives or the running sum overflows `i64` (one-time promotion to `REAL`). `AVG` always returns `REAL` (or `NULL` on empty / all-NULL groups). `MIN` / `MAX` skip NULLs and use the same total order as `ORDER BY`. Aggregates over an empty table or empty group return `0` for `COUNT(*)` / `COUNT(col)` and `NULL` for the rest. - **`ORDER BY`**: single sort key, `ASC` (default) or `DESC`. For non-aggregating queries the key is any expression — including function calls — so KNN queries like `ORDER BY vec_distance_l2(embedding, [...]) LIMIT k` work end-to-end *(Phase 7b)*. For aggregating queries the key resolves against the *output* row by name: a bare identifier matches an alias or a `GROUP BY` column, and a function call like `COUNT(*)` matches an aggregate projection by its canonical display form. Sort key types must match across rows. @@ -237,12 +237,12 @@ conditions, plus `CROSS JOIN`: - **Self-joins** require an alias on at least one side: `FROM nodes AS p INNER JOIN nodes AS c ON p.id = c.parent_id`. Without one, you get a `duplicate table reference` error so qualifiers stay unambiguous. - **`WHERE` runs after joins.** A `WHERE right.col IS NULL` filter on a `LEFT JOIN` correctly returns left rows with no match (the standard "anti-join via outer-join" idiom). - **`ORDER BY` and `LIMIT`** apply to the fully joined row stream. +- **Aggregates / `GROUP BY` / `DISTINCT` / `HAVING` over joins** (SQLR-6): the fully-joined, `WHERE`-filtered row stream feeds the same aggregation pipeline single-table queries use. `GROUP BY` keys may be qualified (`GROUP BY customers.name`) and must resolve unambiguously; NULL-padded outer-join rows group under a `NULL` key, and `COUNT(col)` skips their NULLs while `COUNT(*)` counts them. `SELECT DISTINCT` dedupes the projected join output (with `LIMIT` applied after the dedupe). - **Algorithm:** plain nested-loop join, O(N×M) per join level. Adequate for an embedded learning database; hash / merge joins on equi-join shapes are a future optimization. #### What's not supported in JOINs - Comma-separated FROM lists (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN` instead. -- Aggregates / `GROUP BY` / `DISTINCT` *over* a join. The single-table aggregator is wired against one rowid stream; rewiring it for joined rows is a separate increment. Surfaces as a clean `NotImplemented` at parse time. - `fts_match` / `bm25_score` inside a JOIN expression. They need to look up an FTS index by column, which is single-table-bound today. Use them on a single-table SELECT first, or fold the FTS lookup into the FROM side. ### Index probing @@ -281,7 +281,6 @@ SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1 AND SUM(salary) > 100; ### What doesn't work - **Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5)) -- **Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land - **Subqueries**, CTEs (`WITH`), views - **`HAVING` without `GROUP BY`** — the degenerate single-group form is rejected; `HAVING` with `GROUP BY` works (see [HAVING semantics](#having-semantics-sqlr-52)) - **`DISTINCT`** on `SUM` / `AVG` / `MIN` / `MAX` (only `COUNT(DISTINCT col)` is supported) @@ -725,8 +724,7 @@ A REPL launched with `sqlrite --readonly foo.sqlrite` (or `sqlrite::open_databas For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when these land: ### Joins & composition -- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN` -- Aggregates / `GROUP BY` / `DISTINCT` *over* a JOIN — pipe through a subquery once subqueries land +- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5), and aggregates / `GROUP BY` / `DISTINCT` / `HAVING` compose over join results (SQLR-6). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN` - `fts_match` / `bm25_score` inside a JOIN expression — single-table-bound today - Subqueries (scalar, `IN (SELECT ...)`, correlated) - CTEs (`WITH`), recursive CTEs diff --git a/docs/usage.md b/docs/usage.md index b62e17a..351b370 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -74,7 +74,7 @@ Quick hits worth knowing when you're working at the REPL: - **Arithmetic stays honest.** Integer-only operations stay integer; any `REAL` operand promotes to `f64`; divide-by-zero is a typed runtime error, never a panic. - **NULL follows three-valued logic.** `NULL = NULL` is unknown (not true) — treated as false in `WHERE`. Use `IS NULL` / `IS NOT NULL` for explicit null tests, e.g. `SELECT id FROM t WHERE qty IS NULL;`. - **Identifiers are case-sensitive** (table / column names; no normalization), but keywords aren't. String literals preserve case. -- **Not yet supported**: joins, subqueries, `GROUP BY` / aggregates, `DISTINCT`, `LIKE` / `IN`, projection expressions, column aliases, `OFFSET`, multi-column `ORDER BY`, savepoints, `ALTER TABLE`, `DROP TABLE`, `DROP INDEX`. See the [full list in the reference](supported-sql.md#not-yet-supported). +- **Not yet supported**: subqueries, CTEs, views, comma joins (`FROM a, b`), projection expressions beyond aggregate calls, `OFFSET`, multi-column `ORDER BY`, savepoints. See the [full list in the reference](supported-sql.md#not-yet-supported). ## History diff --git a/src/sql/agg.rs b/src/sql/agg.rs index 37592e4..673c560 100644 --- a/src/sql/agg.rs +++ b/src/sql/agg.rs @@ -91,7 +91,7 @@ impl AggState { match call.func { AggregateFn::Count => match &call.arg { AggregateArg::Star => AggState::CountStar(0), - AggregateArg::Column(_) => AggState::Count { + AggregateArg::Column { .. } => AggState::Count { non_null: 0, distinct: if call.distinct { Some(HashSet::new()) @@ -445,7 +445,10 @@ mod tests { fn count_col_skips_nulls() { let call = AggregateCall { func: AggregateFn::Count, - arg: AggregateArg::Column("x".into()), + arg: AggregateArg::Column { + qualifier: None, + name: "x".into(), + }, distinct: false, }; let mut s = AggState::new(&call); @@ -459,7 +462,10 @@ mod tests { fn count_distinct_dedupes() { let call = AggregateCall { func: AggregateFn::Count, - arg: AggregateArg::Column("x".into()), + arg: AggregateArg::Column { + qualifier: None, + name: "x".into(), + }, distinct: true, }; let mut s = AggState::new(&call); @@ -474,7 +480,10 @@ mod tests { fn sum_int_stays_int_until_real() { let call = AggregateCall { func: AggregateFn::Sum, - arg: AggregateArg::Column("x".into()), + arg: AggregateArg::Column { + qualifier: None, + name: "x".into(), + }, distinct: false, }; let mut s = AggState::new(&call); @@ -493,7 +502,10 @@ mod tests { fn sum_all_null_is_null() { let call = AggregateCall { func: AggregateFn::Sum, - arg: AggregateArg::Column("x".into()), + arg: AggregateArg::Column { + qualifier: None, + name: "x".into(), + }, distinct: false, }; let mut s = AggState::new(&call); @@ -506,7 +518,10 @@ mod tests { fn avg_always_real() { let call = AggregateCall { func: AggregateFn::Avg, - arg: AggregateArg::Column("x".into()), + arg: AggregateArg::Column { + qualifier: None, + name: "x".into(), + }, distinct: false, }; let mut s = AggState::new(&call); @@ -522,7 +537,10 @@ mod tests { fn min_max_skip_nulls() { let mk = |f| AggregateCall { func: f, - arg: AggregateArg::Column("x".into()), + arg: AggregateArg::Column { + qualifier: None, + name: "x".into(), + }, distinct: false, }; let mut mn = AggState::new(&mk(AggregateFn::Min)); diff --git a/src/sql/executor.rs b/src/sql/executor.rs index 3d44b93..c772839 100644 --- a/src/sql/executor.rs +++ b/src/sql/executor.rs @@ -21,7 +21,7 @@ use crate::sql::db::table::{ use crate::sql::fts::{Bm25Params, PostingList}; use crate::sql::hnsw::{DistanceMetric, HnswIndex}; use crate::sql::parser::select::{ - AggregateArg, AggregateFn, JoinConstraintKind, JoinType, OrderByClause, Projection, + AggregateArg, AggregateFn, GroupByKey, JoinConstraintKind, JoinType, OrderByClause, Projection, ProjectionItem, ProjectionKind, SelectQuery, parse_aggregate_call, }; @@ -243,11 +243,11 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result Result 1`). - // We append those as *hidden* trailing projection slots so the - // existing `aggregate_rows` accumulator computes them alongside - // the visible ones, then strip them after filtering. Aggregate - // calls in the HAVING tree are lowered to identifiers naming - // their output slot (`COUNT(*)` → identifier "COUNT(*)") so the - // shared expression evaluator can resolve them through a - // `GroupRowScope` like any other column. - let mut all_items = proj_items.clone(); - let having_expr = match &query.having { - Some(h) => { - for g in &query.group_by { - if !all_items - .iter() - .any(|i| i.output_name().eq_ignore_ascii_case(g)) - { - all_items.push(ProjectionItem { - kind: ProjectionKind::Column { - qualifier: None, - name: g.clone(), - }, - alias: None, - }); - } - } - Some(lower_having_expr(h, &mut all_items)?) - } - None => None, - }; + let (all_items, having_expr) = lower_having_into_hidden_slots(&query, &proj_items)?; // Validate aggregate column args (visible + HAVING-hidden). for item in &all_items { if let ProjectionKind::Aggregate(call) = &item.kind - && let AggregateArg::Column(c) = &call.arg + && let AggregateArg::Column { name: c, .. } = &call.arg && !table.contains_column(c.clone()) { return Err(SQLRiteError::Internal(format!( @@ -326,32 +296,8 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result = proj_items.iter().map(|i| i.output_name()).collect(); - let mut rows = aggregate_rows(table, &matching, &query.group_by, &all_items)?; - - if let Some(h) = &having_expr { - let all_columns: Vec = all_items.iter().map(|i| i.output_name()).collect(); - rows = filter_groups_by_having(rows, h, &all_columns)?; - } - // Drop the hidden HAVING-only slots back to the user-visible width. - if all_items.len() > proj_items.len() { - for row in &mut rows { - row.truncate(proj_items.len()); - } - } - - if query.distinct { - rows = dedupe_rows(rows); - } - - if let Some(order) = &query.order_by { - sort_output_rows(&mut rows, &columns, &proj_items, order)?; - } - if let Some(k) = query.limit { - rows.truncate(k); - } - - return Ok(SelectResult { columns, rows }); + let scopes = matching.iter().map(|&r| SingleTableScope::new(table, r)); + return run_aggregation_pipeline(scopes, &query, &proj_items, &all_items, &having_expr); } // Non-aggregating path — same flow as before, with the extra @@ -584,12 +530,10 @@ fn col_eq(left_scope: &str, right_scope: &str, col: &str) -> Expr { // learning database" niche; a future phase could layer hash / merge // joins on equi-join shapes without changing the surface API. // -// Aggregates / GROUP BY / DISTINCT over joined results are rejected -// at parse time (see SelectQuery::new). They aren't impossible — -// the joined-row stream is just a different rowid source feeding -// the same aggregator — but we left the validator that ties bare -// columns to GROUP BY a single-table assumption, and reworking it -// is outside this phase. Surfaces as a clean NotImplemented today. +// SQLR-6 — aggregates / GROUP BY / DISTINCT compose with joins: the +// fully-joined row stream feeds the same scope-generic aggregation +// pipeline the single-table path uses (Stage 3.5 below), and DISTINCT +// dedupes the projected output rows. fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result { // Resolve every participating table once and capture its scope // name (alias if supplied, else table name). Scope names are @@ -824,6 +768,59 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result { + if let AggregateArg::Column { qualifier, name } = &call.arg { + resolve_scope_column(&joined_tables, qualifier.as_deref(), name)?; + } + } + ProjectionKind::Column { qualifier, name } => { + let pos = resolve_scope_column(&joined_tables, qualifier.as_deref(), name)?; + let in_group_by = query.group_by.iter().any(|g| { + g.name == *name + && resolve_scope_column(&joined_tables, g.qualifier.as_deref(), &g.name) + == Ok(pos) + }); + if !in_group_by { + return Err(SQLRiteError::Internal(format!( + "column '{name}' must appear in GROUP BY or be used in an \ + aggregate function" + ))); + } + } + } + } + + let scopes = filtered.iter().map(|row| JoinedScope { + tables: &joined_tables, + rowids: row, + }); + return run_aggregation_pipeline(scopes, &query, &proj_items, &all_items, &having_expr); + } + // Stage 4: ORDER BY across the joined scope. We pre-compute the // sort key per row (same approach as `sort_rowids`) so the // comparator runs on Values, not against the expression tree. @@ -850,8 +847,13 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result Result { - // SelectQuery::new already rejects this combination, - // but defense in depth keeps the pattern match total. + // Aggregates are handled by the Stage 3.5 pipeline, + // which returns before reaching this projection — + // defense in depth keeps the pattern match total. return Err(SQLRiteError::Internal( - "aggregate functions over JOIN are not supported".to_string(), + "aggregate projection reached the non-aggregating join path".to_string(), )); } }; @@ -882,9 +885,64 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result], + qualifier: Option<&str>, + name: &str, +) -> Result { + if let Some(q) = qualifier { + let pos = tables + .iter() + .position(|t| t.scope_name.eq_ignore_ascii_case(q)) + .ok_or_else(|| { + SQLRiteError::Internal(format!( + "unknown table qualifier '{q}' in column reference '{q}.{name}'" + )) + })?; + if !tables[pos].table.contains_column(name.to_string()) { + return Err(SQLRiteError::Internal(format!( + "column '{name}' does not exist on '{}'", + tables[pos].scope_name + ))); + } + return Ok(pos); + } + let mut hit: Option = None; + for (i, t) in tables.iter().enumerate() { + if t.table.contains_column(name.to_string()) { + if hit.is_some() { + return Err(SQLRiteError::Internal(format!( + "column reference '{name}' is ambiguous — qualify it as
.{name}" + ))); + } + hit = Some(i); + } + } + hit.ok_or_else(|| { + SQLRiteError::Internal(format!( + "unknown column '{name}' in joined SELECT (no in-scope table has it)" + )) + }) +} + /// Executes a SELECT and returns `(rendered_table, row_count)`. The /// REPL and Tauri app use this to keep the table-printing behaviour /// the engine has always shipped. Structured callers use @@ -2648,10 +2706,10 @@ fn eval_function(func: &sqlparser::ast::Function, scope: &dyn RowScope) -> Resul } // SQLR-3: catch aggregate names used in scalar position (e.g. // `WHERE COUNT(*) > 1`) with a clearer message than "unknown - // function". HAVING isn't supported yet, hence the explicit nudge. + // function". "count" | "sum" | "avg" | "min" | "max" => Err(SQLRiteError::NotImplemented(format!( "aggregate function '{name}' is not allowed in WHERE / projection-scalar position; \ - use it as a top-level projection item (HAVING is not yet supported)" + use it as a top-level projection item or in HAVING" ))), other => Err(SQLRiteError::NotImplemented(format!( "unknown function: {other}(...)" @@ -3274,16 +3332,102 @@ fn eval_in_list(scope: &dyn RowScope, lhs: &Expr, list: &[Expr], negated: bool) // SQLR-3 — Aggregation phase, DISTINCT, post-projection sort // ----------------------------------------------------------------- -/// Walk `matching` rowids, partition into groups (one synthetic group +/// SQLR-52 — HAVING lowering, shared by the single-table and joined +/// aggregation paths. The expression may reference aggregates and +/// GROUP BY keys that aren't in the SELECT output (SQLite allows +/// both: `SELECT dept FROM t GROUP BY dept HAVING COUNT(*) > 1`). +/// We append those as *hidden* trailing projection slots so the +/// `aggregate_rows` accumulator computes them alongside the visible +/// ones; the pipeline strips them after filtering. Aggregate calls in +/// the HAVING tree are lowered to identifiers naming their output slot +/// (`COUNT(*)` → identifier "COUNT(*)") so the shared expression +/// evaluator can resolve them through a `GroupRowScope` like any other +/// column. Returns the widened projection list plus the lowered +/// HAVING expression (`None` when the query has no HAVING). +fn lower_having_into_hidden_slots( + query: &SelectQuery, + proj_items: &[ProjectionItem], +) -> Result<(Vec, Option)> { + let mut all_items = proj_items.to_vec(); + let having_expr = match &query.having { + Some(h) => { + for g in &query.group_by { + if !all_items + .iter() + .any(|i| i.output_name().eq_ignore_ascii_case(&g.name)) + { + all_items.push(ProjectionItem { + kind: ProjectionKind::Column { + qualifier: g.qualifier.clone(), + name: g.name.clone(), + }, + alias: None, + }); + } + } + Some(lower_having_expr(h, &mut all_items)?) + } + None => None, + }; + Ok((all_items, having_expr)) +} + +/// The aggregation tail shared by the single-table and joined SELECT +/// paths: accumulate groups over the row scopes, apply HAVING, strip +/// hidden HAVING-only slots, then DISTINCT / ORDER BY / LIMIT on the +/// output rows. Callers validate column references against their own +/// scope (table schema vs. joined-table list) before invoking. +fn run_aggregation_pipeline( + scopes: impl IntoIterator, + query: &SelectQuery, + proj_items: &[ProjectionItem], + all_items: &[ProjectionItem], + having_expr: &Option, +) -> Result { + let columns: Vec = proj_items.iter().map(|i| i.output_name()).collect(); + let mut rows = aggregate_rows(scopes, &query.group_by, all_items)?; + + if let Some(h) = having_expr { + let all_columns: Vec = all_items.iter().map(|i| i.output_name()).collect(); + rows = filter_groups_by_having(rows, h, &all_columns)?; + } + // Drop the hidden HAVING-only slots back to the user-visible width. + if all_items.len() > proj_items.len() { + for row in &mut rows { + row.truncate(proj_items.len()); + } + } + + if query.distinct { + rows = dedupe_rows(rows); + } + + if let Some(order) = &query.order_by { + sort_output_rows(&mut rows, &columns, proj_items, order)?; + } + if let Some(k) = query.limit { + rows.truncate(k); + } + + Ok(SelectResult { columns, rows }) +} + +/// Walk the row scopes, partition into groups (one synthetic group /// when `group_by` is empty), update one `AggState` per aggregate /// projection slot per group, then materialize one output row per /// group in projection order. Group-key columns surface their original /// `Value` (captured the first time the group was seen); aggregate /// slots surface `AggState::finalize()`. -fn aggregate_rows( - table: &Table, - matching: &[i64], - group_by: &[String], +/// +/// SQLR-6 — generic over [`RowScope`] so the same accumulator serves +/// the single-table path (a [`SingleTableScope`] per matching rowid) +/// and the joined path (a [`JoinedScope`] per joined row, where +/// NULL-padded outer-join sides surface as `Value::Null` — grouped +/// together like any other NULL, and skipped by `COUNT(col)` per the +/// usual NULL-skipping aggregate semantics). +fn aggregate_rows( + scopes: impl IntoIterator, + group_by: &[GroupByKey], proj_items: &[ProjectionItem], ) -> Result>> { // Build the per-projection-slot accumulator template once. Each @@ -3306,11 +3450,11 @@ fn aggregate_rows( let mut group_states: Vec>> = Vec::new(); let mut group_key_values: Vec> = Vec::new(); - for &rowid in matching { + for scope in scopes { let mut key_values: Vec = Vec::with_capacity(group_by.len()); let mut key: Vec = Vec::with_capacity(group_by.len()); - for col in group_by { - let v = table.get_value(col, rowid).unwrap_or(Value::Null); + for g in group_by { + let v = scope.lookup(g.qualifier.as_deref(), &g.name)?; key.push(DistinctKey::from_value(&v)); key_values.push(v); } @@ -3328,7 +3472,9 @@ fn aggregate_rows( if let ProjectionKind::Aggregate(call) = &item.kind { let v = match &call.arg { AggregateArg::Star => Value::Null, - AggregateArg::Column(c) => table.get_value(c, rowid).unwrap_or(Value::Null), + AggregateArg::Column { qualifier, name } => { + scope.lookup(qualifier.as_deref(), name)? + } }; if let Some(state) = group_states[idx][slot].as_mut() { state.update(&v)?; @@ -3356,13 +3502,20 @@ fn aggregate_rows( let mut row: Vec = Vec::with_capacity(proj_items.len()); for (slot, item) in proj_items.iter().enumerate() { match &item.kind { - ProjectionKind::Column { name: c, .. } => { - // The validation in execute_select_rows guarantees - // bare-column projections are also in `group_by`. + ProjectionKind::Column { qualifier, name: c } => { + // Parser / executor validation ties bare-column + // projections to GROUP BY entries, but `SELECT *` + // expansions reach here unvalidated — surface a + // clean error rather than panicking. let pos = group_by .iter() - .position(|g| g == c) - .expect("validated to be in GROUP BY"); + .position(|g| g.matches_column(qualifier.as_deref(), c)) + .ok_or_else(|| { + SQLRiteError::Internal(format!( + "column '{c}' must appear in GROUP BY or be used in an \ + aggregate function" + )) + })?; row.push(group_key_values[group_idx][pos].clone()); } ProjectionKind::Aggregate(_) => { @@ -3601,9 +3754,13 @@ fn resolve_order_by_index( } // Function form: match by display name against any aggregate item // whose canonical display equals the user's call. Tolerate case - // differences in the function name. + // differences in the function name. SQLR-6 — a second pass with + // `t.` qualifiers stripped from both sides keeps qualifier + // spelling differences from blocking the match (`ORDER BY + // SUM(amount)` finds a `SELECT SUM(o.amount)` slot and vice + // versa), preserving the pre-qualifier behavior. if let Expr::Function(func) = expr { - let user_disp = format_function_display(func); + let user_disp = format_function_display(func, true); for (i, item) in proj_items.iter().enumerate() { if let ProjectionKind::Aggregate(call) = &item.kind && call.display_name().eq_ignore_ascii_case(&user_disp) @@ -3611,6 +3768,16 @@ fn resolve_order_by_index( return Ok(i); } } + let user_disp_unqualified = format_function_display(func, false); + for (i, item) in proj_items.iter().enumerate() { + if let ProjectionKind::Aggregate(call) = &item.kind + && call + .display_name_unqualified() + .eq_ignore_ascii_case(&user_disp_unqualified) + { + return Ok(i); + } + } return Err(SQLRiteError::Internal(format!( "ORDER BY references aggregate '{user_disp}' that isn't in the SELECT output" ))); @@ -3623,7 +3790,9 @@ fn resolve_order_by_index( /// Format a sqlparser function call into the same canonical form /// `AggregateCall::display_name()` uses, so ORDER BY on /// `COUNT(*)` / `SUM(salary)` matches its projection counterpart. -fn format_function_display(func: &sqlparser::ast::Function) -> String { +/// `qualified` keeps or strips the argument's `t.` qualifier, matching +/// `display_name()` / `display_name_unqualified()` respectively. +fn format_function_display(func: &sqlparser::ast::Function, qualified: bool) -> String { let name = match func.name.0.as_slice() { [ObjectNamePart::Identifier(ident)] => ident.value.to_uppercase(), _ => format!("{:?}", func.name).to_uppercase(), @@ -3638,7 +3807,15 @@ fn format_function_display(func: &sqlparser::ast::Function) -> String { FunctionArg::Unnamed(FunctionArgExpr::Wildcard) => "*".to_string(), FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Identifier(i))) => i.value.clone(), FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::CompoundIdentifier(parts))) => { - parts.last().map(|p| p.value.clone()).unwrap_or_default() + if qualified { + parts + .iter() + .map(|p| p.value.clone()) + .collect::>() + .join(".") + } else { + parts.last().map(|p| p.value.clone()).unwrap_or_default() + } } _ => String::new(), }); @@ -4625,23 +4802,21 @@ mod tests { } #[test] - fn having_over_join_rejected_for_all_flavors() { - // Aggregates / GROUP BY over JOIN results are a separate increment - // (SQLR-6); until it lands, every JOIN flavor + GROUP BY + HAVING - // must surface the clean NotImplemented — no wrong results. + fn having_over_join_filters_groups_for_all_flavors() { + // SQLR-6 — GROUP BY + HAVING compose with every join flavor. + // Only Alice has more than one order; the dangling order (RIGHT / + // FULL) groups under a NULL name with count 1 and is filtered. for flavor in ["INNER", "LEFT OUTER", "RIGHT OUTER", "FULL OUTER"] { let sql = format!( "SELECT customers.name, COUNT(*) FROM customers \ {flavor} JOIN orders ON customers.id = orders.customer_id \ GROUP BY customers.name HAVING COUNT(*) > 1;" ); - let err = crate::sql::process_command(&sql, &mut seed_join_fixture()); - match err { - Err(SQLRiteError::NotImplemented(msg)) => { - assert!(msg.contains("JOIN"), "{flavor}: unexpected message: {msg}") - } - other => panic!("{flavor}: expected NotImplemented, got {other:?}"), - } + let db = seed_join_fixture(); + let r = run_rows(&db, &sql); + assert_eq!(r.rows.len(), 1, "{flavor}: only Alice has >1 order"); + assert_eq!(r.rows[0][0].to_display_string(), "Alice", "{flavor}"); + assert_eq!(expect_int(&r.rows[0][1]), 2, "{flavor}"); } } @@ -5151,16 +5326,239 @@ mod tests { assert!(res.is_err(), "USING (nope) must error — column absent"); } + // --------------------------------------------------------------------- + // SQLR-6 — aggregates / GROUP BY / DISTINCT over JOIN results + // --------------------------------------------------------------------- + + #[test] + fn group_by_with_aggregates_over_inner_join() { + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT customers.name, COUNT(*), SUM(orders.amount) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name ORDER BY customers.name;", + ); + assert_eq!(r.columns, vec!["name", "COUNT(*)", "SUM(orders.amount)"]); + assert_eq!(r.rows.len(), 2); + assert_eq!(r.rows[0][0].to_display_string(), "Alice"); + assert_eq!(expect_int(&r.rows[0][1]), 2); + assert_eq!(expect_int(&r.rows[0][2]), 300); + assert_eq!(r.rows[1][0].to_display_string(), "Bob"); + assert_eq!(expect_int(&r.rows[1][1]), 1); + assert_eq!(expect_int(&r.rows[1][2]), 50); + } + + #[test] + fn aggregates_over_join_without_group_by() { + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT COUNT(*), SUM(orders.amount) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id;", + ); + assert_eq!(r.rows.len(), 1); + assert_eq!(expect_int(&r.rows[0][0]), 3); + assert_eq!(expect_int(&r.rows[0][1]), 350); + } + #[test] - fn aggregates_over_join_are_rejected() { + fn count_column_skips_outer_join_null_padding() { + // Carol has no orders: her LEFT-JOIN row is NULL-padded on the + // right. COUNT(*) counts the padded row; COUNT(orders.id) skips + // its NULL, per the usual NULL-skipping aggregate semantics. let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT customers.name, COUNT(*), COUNT(orders.id) FROM customers \ + LEFT OUTER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name ORDER BY customers.name;", + ); + assert_eq!(r.rows.len(), 3); + let carol = &r.rows[2]; + assert_eq!(carol[0].to_display_string(), "Carol"); + assert_eq!(expect_int(&carol[1]), 1, "COUNT(*) counts the padded row"); + assert_eq!(expect_int(&carol[2]), 0, "COUNT(col) skips the NULL"); + } + + #[test] + fn outer_join_null_keys_group_together() { + // FULL OUTER surfaces the dangling order (customer_id 4) with a + // NULL customers.name — it must form its own group, not vanish. + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT customers.name, COUNT(*) FROM customers \ + FULL OUTER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name;", + ); + assert_eq!(r.rows.len(), 4, "Alice, Bob, Carol, NULL"); + let null_group = r + .rows + .iter() + .find(|row| row[0] == Value::Null) + .expect("dangling order groups under NULL"); + assert_eq!(expect_int(&null_group[1]), 1); + } + + #[test] + fn count_distinct_over_join() { + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT COUNT(DISTINCT customers.name) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id;", + ); + assert_eq!(expect_int(&r.rows[0][0]), 2); + } + + #[test] + fn group_by_qualified_key_resolves_ambiguous_name() { + // `id` exists on both tables — the qualified GROUP BY key picks + // the customers side. + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT customers.id, COUNT(*) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.id ORDER BY customers.id;", + ); + assert_eq!(r.rows.len(), 2); + assert_eq!(expect_int(&r.rows[0][0]), 1); + assert_eq!(expect_int(&r.rows[0][1]), 2); + } + + #[test] + fn group_by_ambiguous_unqualified_key_over_join_errors() { + let err = crate::sql::process_command( + "SELECT COUNT(*) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id GROUP BY id;", + &mut seed_join_fixture(), + ); + match err { + Err(e) => assert!( + e.to_string().contains("ambiguous"), + "unexpected message: {e}" + ), + Ok(_) => panic!("ambiguous GROUP BY key must error"), + } + } + + #[test] + fn bare_column_not_in_group_by_over_join_errors() { + let err = crate::sql::process_command( + "SELECT orders.amount, COUNT(*) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name;", + &mut seed_join_fixture(), + ); + match err { + Err(e) => assert!( + e.to_string().contains("must appear in GROUP BY"), + "unexpected message: {e}" + ), + Ok(_) => panic!("bare column outside GROUP BY must error"), + } + } + + #[test] + fn aggregate_in_where_over_join_errors_cleanly() { + // Code-review gap from SQLR-5: aggregate misuse inside WHERE on + // a joined query must be a typed error, not wrong results. let err = crate::sql::process_command( "SELECT COUNT(*) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id \ + WHERE COUNT(*) > 1;", + &mut seed_join_fixture(), + ); + match err { + Err(SQLRiteError::NotImplemented(msg)) => assert!( + msg.contains("not allowed in WHERE"), + "unexpected message: {msg}" + ), + other => panic!("expected NotImplemented, got {other:?}"), + } + } + + #[test] + fn order_by_aggregate_over_join() { + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT customers.name, SUM(orders.amount) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name ORDER BY SUM(orders.amount) DESC;", + ); + assert_eq!(r.rows[0][0].to_display_string(), "Alice"); + // Qualifier-stripped fallback: ORDER BY SUM(amount) finds the + // SUM(orders.amount) slot even though the spellings differ. + let r2 = run_rows( + &db, + "SELECT customers.name, SUM(orders.amount) FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name ORDER BY SUM(amount) DESC;", + ); + assert_eq!(r2.rows[0][0].to_display_string(), "Alice"); + } + + #[test] + fn distinct_over_join_dedupes_output_rows() { + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT DISTINCT customers.name FROM customers \ INNER JOIN orders ON customers.id = orders.customer_id;", + ); + assert_eq!(r.rows.len(), 2); + let names: Vec = r + .rows + .iter() + .map(|row| row[0].to_display_string()) + .collect(); + assert_eq!(names, vec!["Alice".to_string(), "Bob".to_string()]); + } + + #[test] + fn distinct_over_join_defers_limit_past_dedupe() { + // Without deferral, LIMIT 2 would truncate the joined rows to + // Alice's two orders and dedupe to a single row. + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT DISTINCT customers.name FROM customers \ + INNER JOIN orders ON customers.id = orders.customer_id LIMIT 2;", + ); + assert_eq!(r.rows.len(), 2, "LIMIT applies after DISTINCT collapses"); + } + + #[test] + fn select_star_group_by_errors_instead_of_panicking() { + // Single-table regression: the parser's "must appear in GROUP BY" + // check skips `SELECT *`, so the executor used to hit an + // `expect()` panic when a non-grouped column reached projection. + let err = crate::sql::process_command( + "SELECT * FROM orders GROUP BY customer_id;", &mut seed_join_fixture(), ); - assert!(err.is_err(), "aggregates over JOIN are not yet supported"); - let _ = db; // keep compiler happy if unused + match err { + Err(e) => assert!( + e.to_string().contains("must appear in GROUP BY"), + "unexpected message: {e}" + ), + Ok(_) => panic!("SELECT * with GROUP BY must error, not panic"), + } + } + + #[test] + fn group_by_qualified_key_single_table_still_works() { + // Qualified GROUP BY keys are accepted on the single-table path + // too (qualifier ignored, same posture as projections). + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept, COUNT(*) FROM emp GROUP BY emp.dept ORDER BY dept;", + ); + assert_eq!(r.rows.len(), 3, "eng / sales / ops"); } #[test] diff --git a/src/sql/parser/select.rs b/src/sql/parser/select.rs index deabf97..952f48f 100644 --- a/src/sql/parser/select.rs +++ b/src/sql/parser/select.rs @@ -39,11 +39,18 @@ impl AggregateFn { } } -/// What the aggregate is fed: `*` (only valid for COUNT) or a bare column. +/// What the aggregate is fed: `*` (only valid for COUNT) or a column +/// reference. SQLR-6 — the column carries its optional `t.` qualifier +/// so joined aggregation (`SUM(orders.amount)`) can disambiguate +/// same-named columns across in-scope tables; the single-table path +/// ignores it, same as projection qualifiers. #[derive(Debug, Clone, PartialEq, Eq)] pub enum AggregateArg { Star, - Column(String), + Column { + qualifier: Option, + name: String, + }, } /// A parsed aggregate call like `COUNT(*)`, `SUM(salary)`, `COUNT(DISTINCT dept)`. @@ -60,13 +67,29 @@ impl AggregateCall { /// aggregate output columns when the user didn't supply an alias. /// Mirrors the output-header convention. pub fn display_name(&self) -> String { + self.display_name_impl(true) + } + + /// Display form with the argument's `t.` qualifier stripped. Used + /// as a fallback when matching ORDER BY function calls against + /// projection slots, so `ORDER BY SUM(amount)` still finds a + /// `SELECT SUM(o.amount)` slot (and vice versa). + pub(crate) fn display_name_unqualified(&self) -> String { + self.display_name_impl(false) + } + + fn display_name_impl(&self, qualified: bool) -> String { let inner = match &self.arg { AggregateArg::Star => "*".to_string(), - AggregateArg::Column(c) => { + AggregateArg::Column { qualifier, name } => { + let col = match qualifier { + Some(q) if qualified => format!("{q}.{name}"), + _ => name.clone(), + }; if self.distinct { - format!("DISTINCT {c}") + format!("DISTINCT {col}") } else { - c.clone() + col } } }; @@ -125,6 +148,32 @@ pub enum Projection { Items(Vec), } +/// SQLR-6 — one GROUP BY key: an optionally-qualified column reference +/// (`dept` or `t.dept`). The qualifier matters for joined SELECTs, +/// where the same column name can exist on several in-scope tables; +/// the single-table path ignores it, mirroring projection qualifiers. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct GroupByKey { + pub qualifier: Option, + pub name: String, +} + +impl GroupByKey { + /// Does a (possibly qualified) column reference name this key? + /// Names must match exactly; qualifiers must match (ASCII + /// case-insensitively) only when both sides carry one — a bare + /// reference matches a qualified key and vice versa. Callers that + /// need strict table-resolution equality (the joined executor) + /// layer that check on top. + pub(crate) fn matches_column(&self, qualifier: Option<&str>, name: &str) -> bool { + self.name == name + && match (self.qualifier.as_deref(), qualifier) { + (Some(kq), Some(q)) => kq.eq_ignore_ascii_case(q), + _ => true, + } + } +} + /// A parsed `ORDER BY` clause: a single sort key (expression), ascending /// by default. Phase 7b widened this from "bare column name" to /// "arbitrary expression" so KNN queries of the form @@ -224,8 +273,9 @@ pub struct SelectQuery { pub limit: Option, /// `SELECT DISTINCT`. pub distinct: bool, - /// `GROUP BY a, b` — bare column names. Empty = no GROUP BY. - pub group_by: Vec, + /// `GROUP BY a, t.b` — optionally-qualified column references in + /// source order. Empty = no GROUP BY. + pub group_by: Vec, /// SQLR-52 — raw sqlparser HAVING expression, evaluated by the /// executor against each group's output row after aggregation. /// Parser-level invariant: `Some` implies `group_by` is non-empty @@ -277,28 +327,44 @@ impl SelectQuery { )); } }; - // SQLR-3: parse GROUP BY into a list of bare column names. + // SQLR-3: parse GROUP BY into a list of column references. // GroupByExpr::Expressions(v, _) with an empty v is the "no // GROUP BY" shape; non-empty means we've got grouping. Reject - // GROUP BY ALL and GROUP BY on non-bare expressions for v1. - let group_by_cols: Vec = match group_by { + // GROUP BY ALL and GROUP BY on non-column expressions for v1. + // SQLR-6 — keys keep their `t.` qualifier so joined grouping + // (`GROUP BY customers.name`) can disambiguate. + let group_by_cols: Vec = match group_by { sqlparser::ast::GroupByExpr::Expressions(exprs, _) => { let mut out = Vec::with_capacity(exprs.len()); for e in exprs { - let col = match e { - Expr::Identifier(ident) => ident.value.clone(), - Expr::CompoundIdentifier(parts) => { - parts.last().map(|p| p.value.clone()).ok_or_else(|| { - SQLRiteError::Internal("empty compound identifier".to_string()) - })? - } + let key = match e { + Expr::Identifier(ident) => GroupByKey { + qualifier: None, + name: ident.value.clone(), + }, + Expr::CompoundIdentifier(parts) => match parts.as_slice() { + [only] => GroupByKey { + qualifier: None, + name: only.value.clone(), + }, + [q, c] => GroupByKey { + qualifier: Some(q.value.clone()), + name: c.value.clone(), + }, + _ => { + return Err(SQLRiteError::NotImplemented(format!( + "GROUP BY identifier with {} parts is not supported", + parts.len() + ))); + } + }, other => { return Err(SQLRiteError::NotImplemented(format!( "GROUP BY only supports bare column references for now, got {other:?}" ))); } }; - out.push(col); + out.push(key); } out } @@ -330,12 +396,19 @@ impl SelectQuery { // SQLR-3 validation: when GROUP BY is present, every bare-column // entry in the projection must appear in the GROUP BY list. Bare // columns in the SELECT are otherwise undefined per group. - if !group_by_cols.is_empty() + // SQLR-6 — only the single-table case validates here; the joined + // case needs the table schemas to resolve qualifiers, so the + // joined executor performs the equivalent check against the + // full in-scope table list. + if joins.is_empty() + && !group_by_cols.is_empty() && let Projection::Items(items) = &projection { for item in items { - if let ProjectionKind::Column { name: c, .. } = &item.kind - && !group_by_cols.contains(c) + if let ProjectionKind::Column { qualifier, name: c } = &item.kind + && !group_by_cols + .iter() + .any(|g| g.matches_column(qualifier.as_deref(), c)) { return Err(SQLRiteError::Internal(format!( "column '{c}' must appear in GROUP BY or be used in an aggregate function" @@ -344,29 +417,6 @@ impl SelectQuery { } } - // SQLR-5 — aggregations across joined results aren't covered - // by the current single-table grouping pipeline. Reject GROUP - // BY / aggregates over a join up front so the user gets a clear - // message rather than wrong results. - if !joins.is_empty() { - let has_agg = matches!( - &projection, - Projection::Items(items) - if items.iter().any(|i| matches!(i.kind, ProjectionKind::Aggregate(_))) - ); - if has_agg || !group_by_cols.is_empty() { - return Err(SQLRiteError::NotImplemented( - "GROUP BY / aggregate functions over JOIN results are not supported yet" - .to_string(), - )); - } - if distinct_flag { - return Err(SQLRiteError::NotImplemented( - "SELECT DISTINCT over JOIN results is not supported yet".to_string(), - )); - } - } - Ok(SelectQuery { table_name, table_alias, @@ -656,14 +706,28 @@ pub(crate) fn parse_aggregate_call(func: &sqlparser::ast::Function) -> Result AggregateArg::Star, FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Identifier(ident))) => { - AggregateArg::Column(ident.value.clone()) + AggregateArg::Column { + qualifier: None, + name: ident.value.clone(), + } } FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::CompoundIdentifier(parts))) => { - let c = parts - .last() - .map(|p| p.value.clone()) - .ok_or_else(|| SQLRiteError::Internal("empty compound identifier".to_string()))?; - AggregateArg::Column(c) + match parts.as_slice() { + [only] => AggregateArg::Column { + qualifier: None, + name: only.value.clone(), + }, + [q, c] => AggregateArg::Column { + qualifier: Some(q.value.clone()), + name: c.value.clone(), + }, + _ => { + return Err(SQLRiteError::NotImplemented(format!( + "{name}(...) — argument identifier with {} parts is not supported", + parts.len() + ))); + } + } } other => { return Err(SQLRiteError::NotImplemented(format!( diff --git a/web/src/app/docs/page.tsx b/web/src/app/docs/page.tsx index 29cc2dc..117181c 100644 --- a/web/src/app/docs/page.tsx +++ b/web/src/app/docs/page.tsx @@ -315,7 +315,9 @@ export default function DocsPage() { NATURAL column shows once in SELECT *). Comma-separated FROMs (FROM a, b) are not — use an explicit JOIN / CROSS JOIN. Aggregates /{" "} - GROUP BY over a join lands once subqueries do. + GROUP BY / DISTINCT /{" "} + HAVING compose over join results, with optionally + qualified keys (GROUP BY customers.name).

GROUP BY & aggregates

diff --git a/web/src/components/sql-ref.tsx b/web/src/components/sql-ref.tsx index 3bf7747..5000392 100644 --- a/web/src/components/sql-ref.tsx +++ b/web/src/components/sql-ref.tsx @@ -178,7 +178,6 @@ const NOT_YET = [ "multi-column ORDER BY", "UNION / INTERSECT / EXCEPT", "INSERT … SELECT", - "GROUP BY / DISTINCT over JOINs", "CREATE VIEW / TRIGGER", "FOREIGN KEY / CHECK", "savepoints",