diff --git a/README.md b/README.md
index 720b70d..7f06084 100644
--- a/README.md
+++ b/README.md
@@ -175,7 +175,7 @@ sqlrite> DELETE FROM users WHERE age < 30;
| `CREATE TABLE` | `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; `IF NOT EXISTS` (idempotent re-create); duplicate-column detection; types `INTEGER`/`INT`/`BIGINT`/`SMALLINT`, `TEXT`/`VARCHAR`, `REAL`/`FLOAT`/`DOUBLE`/`DECIMAL`, `BOOLEAN`. Auto-creates `sqlrite_autoindex_
_` for every PK + UNIQUE column |
| `CREATE [UNIQUE] INDEX` | Single-column, named indexes; `IF NOT EXISTS`; persists as a dedicated cell-based B-Tree. INTEGER + TEXT columns only |
| `INSERT INTO` | Explicit column list required; auto-ROWID for `INTEGER PRIMARY KEY`; multi-row `VALUES (…), (…)`; UNIQUE enforcement; clean type errors (no panics); NULL padding for omitted columns |
-| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` |
+| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, t.col …]` (qualified keys allowed); `HAVING`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references — aggregates / `GROUP BY` / `DISTINCT` / `HAVING` all compose over join results; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` |
| `UPDATE` | Multi-column `SET`; `WHERE`; UNIQUE + type enforcement; arithmetic in assignments (`SET age = age + 1`) |
| `DELETE` | `WHERE` predicate or full-table delete |
| `BEGIN` / `COMMIT` / `ROLLBACK` | Real transactions, snapshot-based; WAL-backed commit; single-level (no savepoints); auto-rollback if `COMMIT`'s disk write fails |
@@ -193,7 +193,7 @@ Expressions in `WHERE` and `UPDATE`'s `SET` RHS:
- String concat — `||`
- Literals — integer + real numbers, `'single-quoted strings'`, `TRUE` / `FALSE`, `NULL`; parentheses for grouping
-**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`), aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference.
+**Not yet supported** (common ones): subqueries, CTEs, `HAVING` without `GROUP BY`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`). The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference.
#### Meta commands
diff --git a/docs/architecture.md b/docs/architecture.md
index 85771ef..8ad5048 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -139,7 +139,6 @@ Steps 1–7 are purely in-memory; step 8 is the only disk contact, and after the
The roadmap has shipped far enough that the original "deliberately missing" list mostly turned into shipped features. What's still left:
- **No query optimizer** beyond the bounded-heap top-k pass for KNN (Phase 7c) and the HNSW probe shortcut (7d.2). Equality-on-PK probes are direct; everything else is a table scan. Joins use plain nested-loop (O(N×M) per join level); hash / merge joins on equi-join shapes are a future increment.
-- **Aggregates / GROUP BY / DISTINCT over joined results.** The single-table aggregator is wired against one rowid stream; the multi-table join executor produces joined rows but doesn't yet feed them through the aggregator. Surfaces as a clean `NotImplemented` at parse time. The single-table aggregation path (SQLR-3) is fully shipped.
- **No network layer.** SQLRite is embedded-only. The closest thing is the [`sqlrite-mcp`](mcp.md) server, which is stdio (not network). A real wire protocol isn't on the roadmap.
- **No streaming row cursor.** `Rows` is currently backed by an eager `Vec` (Phase 5a). The `Rows::next` API is shaped to support a real cursor — the swap is deferred to **5a.2**.
diff --git a/docs/roadmap.md b/docs/roadmap.md
index 1e92747..fe8bf49 100644
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -561,7 +561,7 @@ The biggest single SQL-surface jump in the project's history.
- Self-joins require an alias on at least one side.
- `WHERE` runs after joins (the standard `LEFT JOIN ... WHERE right.col IS NULL` anti-join idiom works).
-`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported. Not yet supported: comma-separated FROMs (`FROM a, b`), aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization.
+`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported, and aggregates / `GROUP BY` / `DISTINCT` / `HAVING` compose over join results (SQLR-6). Not yet supported: comma-separated FROMs (`FROM a, b`), `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization.
### ✅ Phase 9g — Prepared statements + parameter binding *(v0.9.0, SQLR-23)*
@@ -736,7 +736,7 @@ The remaining items — actually open, not retroactively rewritten:
- Subqueries (scalar, `IN (SELECT ...)`, correlated) and CTEs (`WITH`, recursive)
- ~~`HAVING` (post-aggregation filter)~~ ✅ Shipped (SQLR-52) — group-row filter after aggregation; references GROUP BY keys, aggregate aliases, and direct aggregate calls (hidden-slot computation for HAVING-only aggregates). `HAVING` without `GROUP BY` stays rejected in v0.
- `CASE WHEN … THEN … END`, `BETWEEN`, `GLOB`, `REGEXP`, `LIKE … ESCAPE ''`
-- Aggregates / `GROUP BY` / `DISTINCT` *over* joins (needs a single executor pass that knows about multiple input streams)
+- ~~Aggregates / `GROUP BY` / `DISTINCT` *over* joins~~ ✅ Shipped (SQLR-6) — the joined row stream feeds the same scope-generic aggregation pipeline the single-table path uses; `GROUP BY` keys accept `t.col` qualifiers; `HAVING` and `SELECT DISTINCT` compose too.
- Multi-column / expression `ORDER BY`, `OFFSET`, `NULLS FIRST/LAST`
- `UNION` / `INTERSECT` / `EXCEPT`, `INSERT ... SELECT`
- Composite + expression indexes
diff --git a/docs/sql-engine.md b/docs/sql-engine.md
index 6b30299..b2b6499 100644
--- a/docs/sql-engine.md
+++ b/docs/sql-engine.md
@@ -49,11 +49,11 @@ The `sqlparser` AST is designed to cover every SQL dialect, so its types are hug
`UPDATE` and `DELETE` don't have a dedicated internal struct — the executor pattern-matches the sqlparser types directly because there's less transformation needed.
-`SelectQuery::projection` is now `Projection::All | Projection::Items(Vec)`, where each item carries a `ProjectionKind::Column { qualifier, name }` (qualifier is `Some` for `t.col` shapes, used by JOIN execution to disambiguate) or `ProjectionKind::Aggregate(AggregateCall)` plus an optional `AS alias`. `AggregateCall` covers `COUNT(*)`, `COUNT([DISTINCT] col)`, `SUM` / `AVG` / `MIN` / `MAX` of a bare column. `group_by` is a `Vec` of bare column names (empty = no GROUP BY); the parser validates that every non-aggregate projection item appears in `GROUP BY`.
+`SelectQuery::projection` is now `Projection::All | Projection::Items(Vec)`, where each item carries a `ProjectionKind::Column { qualifier, name }` (qualifier is `Some` for `t.col` shapes, used by JOIN execution to disambiguate) or `ProjectionKind::Aggregate(AggregateCall)` plus an optional `AS alias`. `AggregateCall` covers `COUNT(*)`, `COUNT([DISTINCT] col)`, `SUM` / `AVG` / `MIN` / `MAX` of a column reference (optionally qualified, `SUM(o.amount)`). `group_by` is a `Vec` of optionally-qualified column references (`GROUP BY dept`, `GROUP BY customers.name`; empty = no GROUP BY). The parser validates that every non-aggregate projection item appears in `GROUP BY` for single-table queries; joined queries defer that check to the executor, which resolves qualifiers against the in-scope table schemas (SQLR-6).
`SelectQuery::joins` (SQLR-5) is a `Vec` evaluated left-to-right by `execute_select_rows_joined`. Each clause carries a `JoinType` (`Inner` / `LeftOuter` / `RightOuter` / `FullOuter`), the right-table name + optional alias, and a required `ON` expression. Empty = single-table SELECT, the existing fast path with HNSW / FTS / bounded-heap optimizations.
-Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).)
+Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), `HAVING` without `GROUP BY`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).)
## Statement dispatch
@@ -176,6 +176,14 @@ contributes to its group), so the executor takes a separate path:
against the *output* row by alias, bare column name, or aggregate
display form), then LIMIT.
+SQLR-6 made steps 2–6 scope-generic: the accumulator consumes
+`RowScope`s instead of `(table, rowid)` pairs, so a joined SELECT feeds
+its fully-joined, WHERE-filtered rows (each one a `JoinedScope`)
+through the exact same pipeline. `GROUP BY` keys and aggregate args
+carry an optional `t.` qualifier for disambiguation; NULL-padded
+outer-join rows group under a `NULL` key and are skipped by
+`COUNT(col)` like any other NULL.
+
Aggregate function names (`COUNT`/`SUM`/`AVG`/`MIN`/`MAX`) used in WHERE
or any other scalar position get a friendly error redirecting the user
to the projection list (`HAVING` is where post-aggregate filters go).
diff --git a/docs/supported-sql.md b/docs/supported-sql.md
index b659f68..60c0ef5 100644
--- a/docs/supported-sql.md
+++ b/docs/supported-sql.md
@@ -204,7 +204,7 @@ COUNT([DISTINCT] ) -- counts non-NULL values, option
- **Projection**: `*` (all columns in declaration order), a bare column list, or an explicit list mixing bare columns and aggregate calls. Each item can carry an optional `AS alias` (the alias becomes the output column header and is recognized by `ORDER BY`).
- **`WHERE`**: any [expression](#expressions). Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). Includes **`IS NULL`** / **`IS NOT NULL`** for explicit null tests, **`LIKE` / `NOT LIKE` / `ILIKE`** for pattern matching, and **`IN (list) / NOT IN (list)`** for set-membership against literal lists.
- **`DISTINCT`**: `SELECT DISTINCT` deduplicates result rows after projection (and after aggregation, when both apply). `NULL` values compare equal to other `NULL`s for dedupe, matching SQL's DISTINCT semantic.
-- **`GROUP BY`**: one or more bare column names. Every non-aggregate item in the projection must appear in the `GROUP BY` list (the parser rejects the violation with a clear message). `GROUP BY ` without any aggregate behaves like an implicit `DISTINCT `.
+- **`GROUP BY`**: one or more column names, optionally qualified (`GROUP BY customers.name`) — the qualifier disambiguates same-named columns across joined tables (SQLR-6). Every non-aggregate item in the projection must appear in the `GROUP BY` list (rejected with a clear message — by the parser for single-table queries, by the executor for joined ones, where resolving qualifiers needs the schemas). `GROUP BY ` without any aggregate behaves like an implicit `DISTINCT `.
- **`HAVING`** (SQLR-52): post-aggregation filter over the grouped output. `WHERE` filters rows before grouping; `HAVING` filters groups after aggregation. Requires `GROUP BY` (see [HAVING semantics](#having-semantics-sqlr-52)).
- **Aggregates** (SQLR-3): `COUNT(*)`, `COUNT(col)`, `COUNT(DISTINCT col)`, `SUM(col)`, `AVG(col)`, `MIN(col)`, `MAX(col)`. `SUM` over an integer column stays `INTEGER` until a `REAL` input arrives or the running sum overflows `i64` (one-time promotion to `REAL`). `AVG` always returns `REAL` (or `NULL` on empty / all-NULL groups). `MIN` / `MAX` skip NULLs and use the same total order as `ORDER BY`. Aggregates over an empty table or empty group return `0` for `COUNT(*)` / `COUNT(col)` and `NULL` for the rest.
- **`ORDER BY`**: single sort key, `ASC` (default) or `DESC`. For non-aggregating queries the key is any expression — including function calls — so KNN queries like `ORDER BY vec_distance_l2(embedding, [...]) LIMIT k` work end-to-end *(Phase 7b)*. For aggregating queries the key resolves against the *output* row by name: a bare identifier matches an alias or a `GROUP BY` column, and a function call like `COUNT(*)` matches an aggregate projection by its canonical display form. Sort key types must match across rows.
@@ -237,12 +237,12 @@ conditions, plus `CROSS JOIN`:
- **Self-joins** require an alias on at least one side: `FROM nodes AS p INNER JOIN nodes AS c ON p.id = c.parent_id`. Without one, you get a `duplicate table reference` error so qualifiers stay unambiguous.
- **`WHERE` runs after joins.** A `WHERE right.col IS NULL` filter on a `LEFT JOIN` correctly returns left rows with no match (the standard "anti-join via outer-join" idiom).
- **`ORDER BY` and `LIMIT`** apply to the fully joined row stream.
+- **Aggregates / `GROUP BY` / `DISTINCT` / `HAVING` over joins** (SQLR-6): the fully-joined, `WHERE`-filtered row stream feeds the same aggregation pipeline single-table queries use. `GROUP BY` keys may be qualified (`GROUP BY customers.name`) and must resolve unambiguously; NULL-padded outer-join rows group under a `NULL` key, and `COUNT(col)` skips their NULLs while `COUNT(*)` counts them. `SELECT DISTINCT` dedupes the projected join output (with `LIMIT` applied after the dedupe).
- **Algorithm:** plain nested-loop join, O(N×M) per join level. Adequate for an embedded learning database; hash / merge joins on equi-join shapes are a future optimization.
#### What's not supported in JOINs
- Comma-separated FROM lists (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN` instead.
-- Aggregates / `GROUP BY` / `DISTINCT` *over* a join. The single-table aggregator is wired against one rowid stream; rewiring it for joined rows is a separate increment. Surfaces as a clean `NotImplemented` at parse time.
- `fts_match` / `bm25_score` inside a JOIN expression. They need to look up an FTS index by column, which is single-table-bound today. Use them on a single-table SELECT first, or fold the FTS lookup into the FROM side.
### Index probing
@@ -281,7 +281,6 @@ SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1 AND SUM(salary) > 100;
### What doesn't work
- **Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5))
-- **Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land
- **Subqueries**, CTEs (`WITH`), views
- **`HAVING` without `GROUP BY`** — the degenerate single-group form is rejected; `HAVING` with `GROUP BY` works (see [HAVING semantics](#having-semantics-sqlr-52))
- **`DISTINCT`** on `SUM` / `AVG` / `MIN` / `MAX` (only `COUNT(DISTINCT col)` is supported)
@@ -725,8 +724,7 @@ A REPL launched with `sqlrite --readonly foo.sqlrite` (or `sqlrite::open_databas
For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when these land:
### Joins & composition
-- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN`
-- Aggregates / `GROUP BY` / `DISTINCT` *over* a JOIN — pipe through a subquery once subqueries land
+- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5), and aggregates / `GROUP BY` / `DISTINCT` / `HAVING` compose over join results (SQLR-6). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN`
- `fts_match` / `bm25_score` inside a JOIN expression — single-table-bound today
- Subqueries (scalar, `IN (SELECT ...)`, correlated)
- CTEs (`WITH`), recursive CTEs
diff --git a/docs/usage.md b/docs/usage.md
index b62e17a..351b370 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -74,7 +74,7 @@ Quick hits worth knowing when you're working at the REPL:
- **Arithmetic stays honest.** Integer-only operations stay integer; any `REAL` operand promotes to `f64`; divide-by-zero is a typed runtime error, never a panic.
- **NULL follows three-valued logic.** `NULL = NULL` is unknown (not true) — treated as false in `WHERE`. Use `IS NULL` / `IS NOT NULL` for explicit null tests, e.g. `SELECT id FROM t WHERE qty IS NULL;`.
- **Identifiers are case-sensitive** (table / column names; no normalization), but keywords aren't. String literals preserve case.
-- **Not yet supported**: joins, subqueries, `GROUP BY` / aggregates, `DISTINCT`, `LIKE` / `IN`, projection expressions, column aliases, `OFFSET`, multi-column `ORDER BY`, savepoints, `ALTER TABLE`, `DROP TABLE`, `DROP INDEX`. See the [full list in the reference](supported-sql.md#not-yet-supported).
+- **Not yet supported**: subqueries, CTEs, views, comma joins (`FROM a, b`), projection expressions beyond aggregate calls, `OFFSET`, multi-column `ORDER BY`, savepoints. See the [full list in the reference](supported-sql.md#not-yet-supported).
## History
diff --git a/src/sql/agg.rs b/src/sql/agg.rs
index 37592e4..673c560 100644
--- a/src/sql/agg.rs
+++ b/src/sql/agg.rs
@@ -91,7 +91,7 @@ impl AggState {
match call.func {
AggregateFn::Count => match &call.arg {
AggregateArg::Star => AggState::CountStar(0),
- AggregateArg::Column(_) => AggState::Count {
+ AggregateArg::Column { .. } => AggState::Count {
non_null: 0,
distinct: if call.distinct {
Some(HashSet::new())
@@ -445,7 +445,10 @@ mod tests {
fn count_col_skips_nulls() {
let call = AggregateCall {
func: AggregateFn::Count,
- arg: AggregateArg::Column("x".into()),
+ arg: AggregateArg::Column {
+ qualifier: None,
+ name: "x".into(),
+ },
distinct: false,
};
let mut s = AggState::new(&call);
@@ -459,7 +462,10 @@ mod tests {
fn count_distinct_dedupes() {
let call = AggregateCall {
func: AggregateFn::Count,
- arg: AggregateArg::Column("x".into()),
+ arg: AggregateArg::Column {
+ qualifier: None,
+ name: "x".into(),
+ },
distinct: true,
};
let mut s = AggState::new(&call);
@@ -474,7 +480,10 @@ mod tests {
fn sum_int_stays_int_until_real() {
let call = AggregateCall {
func: AggregateFn::Sum,
- arg: AggregateArg::Column("x".into()),
+ arg: AggregateArg::Column {
+ qualifier: None,
+ name: "x".into(),
+ },
distinct: false,
};
let mut s = AggState::new(&call);
@@ -493,7 +502,10 @@ mod tests {
fn sum_all_null_is_null() {
let call = AggregateCall {
func: AggregateFn::Sum,
- arg: AggregateArg::Column("x".into()),
+ arg: AggregateArg::Column {
+ qualifier: None,
+ name: "x".into(),
+ },
distinct: false,
};
let mut s = AggState::new(&call);
@@ -506,7 +518,10 @@ mod tests {
fn avg_always_real() {
let call = AggregateCall {
func: AggregateFn::Avg,
- arg: AggregateArg::Column("x".into()),
+ arg: AggregateArg::Column {
+ qualifier: None,
+ name: "x".into(),
+ },
distinct: false,
};
let mut s = AggState::new(&call);
@@ -522,7 +537,10 @@ mod tests {
fn min_max_skip_nulls() {
let mk = |f| AggregateCall {
func: f,
- arg: AggregateArg::Column("x".into()),
+ arg: AggregateArg::Column {
+ qualifier: None,
+ name: "x".into(),
+ },
distinct: false,
};
let mut mn = AggState::new(&mk(AggregateFn::Min));
diff --git a/src/sql/executor.rs b/src/sql/executor.rs
index 3d44b93..c772839 100644
--- a/src/sql/executor.rs
+++ b/src/sql/executor.rs
@@ -21,7 +21,7 @@ use crate::sql::db::table::{
use crate::sql::fts::{Bm25Params, PostingList};
use crate::sql::hnsw::{DistanceMetric, HnswIndex};
use crate::sql::parser::select::{
- AggregateArg, AggregateFn, JoinConstraintKind, JoinType, OrderByClause, Projection,
+ AggregateArg, AggregateFn, GroupByKey, JoinConstraintKind, JoinType, OrderByClause, Projection,
ProjectionItem, ProjectionKind, SelectQuery, parse_aggregate_call,
};
@@ -243,11 +243,11 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result Result 1`).
- // We append those as *hidden* trailing projection slots so the
- // existing `aggregate_rows` accumulator computes them alongside
- // the visible ones, then strip them after filtering. Aggregate
- // calls in the HAVING tree are lowered to identifiers naming
- // their output slot (`COUNT(*)` → identifier "COUNT(*)") so the
- // shared expression evaluator can resolve them through a
- // `GroupRowScope` like any other column.
- let mut all_items = proj_items.clone();
- let having_expr = match &query.having {
- Some(h) => {
- for g in &query.group_by {
- if !all_items
- .iter()
- .any(|i| i.output_name().eq_ignore_ascii_case(g))
- {
- all_items.push(ProjectionItem {
- kind: ProjectionKind::Column {
- qualifier: None,
- name: g.clone(),
- },
- alias: None,
- });
- }
- }
- Some(lower_having_expr(h, &mut all_items)?)
- }
- None => None,
- };
+ let (all_items, having_expr) = lower_having_into_hidden_slots(&query, &proj_items)?;
// Validate aggregate column args (visible + HAVING-hidden).
for item in &all_items {
if let ProjectionKind::Aggregate(call) = &item.kind
- && let AggregateArg::Column(c) = &call.arg
+ && let AggregateArg::Column { name: c, .. } = &call.arg
&& !table.contains_column(c.clone())
{
return Err(SQLRiteError::Internal(format!(
@@ -326,32 +296,8 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result = proj_items.iter().map(|i| i.output_name()).collect();
- let mut rows = aggregate_rows(table, &matching, &query.group_by, &all_items)?;
-
- if let Some(h) = &having_expr {
- let all_columns: Vec = all_items.iter().map(|i| i.output_name()).collect();
- rows = filter_groups_by_having(rows, h, &all_columns)?;
- }
- // Drop the hidden HAVING-only slots back to the user-visible width.
- if all_items.len() > proj_items.len() {
- for row in &mut rows {
- row.truncate(proj_items.len());
- }
- }
-
- if query.distinct {
- rows = dedupe_rows(rows);
- }
-
- if let Some(order) = &query.order_by {
- sort_output_rows(&mut rows, &columns, &proj_items, order)?;
- }
- if let Some(k) = query.limit {
- rows.truncate(k);
- }
-
- return Ok(SelectResult { columns, rows });
+ let scopes = matching.iter().map(|&r| SingleTableScope::new(table, r));
+ return run_aggregation_pipeline(scopes, &query, &proj_items, &all_items, &having_expr);
}
// Non-aggregating path — same flow as before, with the extra
@@ -584,12 +530,10 @@ fn col_eq(left_scope: &str, right_scope: &str, col: &str) -> Expr {
// learning database" niche; a future phase could layer hash / merge
// joins on equi-join shapes without changing the surface API.
//
-// Aggregates / GROUP BY / DISTINCT over joined results are rejected
-// at parse time (see SelectQuery::new). They aren't impossible —
-// the joined-row stream is just a different rowid source feeding
-// the same aggregator — but we left the validator that ties bare
-// columns to GROUP BY a single-table assumption, and reworking it
-// is outside this phase. Surfaces as a clean NotImplemented today.
+// SQLR-6 — aggregates / GROUP BY / DISTINCT compose with joins: the
+// fully-joined row stream feeds the same scope-generic aggregation
+// pipeline the single-table path uses (Stage 3.5 below), and DISTINCT
+// dedupes the projected output rows.
fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result {
// Resolve every participating table once and capture its scope
// name (alias if supplied, else table name). Scope names are
@@ -824,6 +768,59 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result {
+ if let AggregateArg::Column { qualifier, name } = &call.arg {
+ resolve_scope_column(&joined_tables, qualifier.as_deref(), name)?;
+ }
+ }
+ ProjectionKind::Column { qualifier, name } => {
+ let pos = resolve_scope_column(&joined_tables, qualifier.as_deref(), name)?;
+ let in_group_by = query.group_by.iter().any(|g| {
+ g.name == *name
+ && resolve_scope_column(&joined_tables, g.qualifier.as_deref(), &g.name)
+ == Ok(pos)
+ });
+ if !in_group_by {
+ return Err(SQLRiteError::Internal(format!(
+ "column '{name}' must appear in GROUP BY or be used in an \
+ aggregate function"
+ )));
+ }
+ }
+ }
+ }
+
+ let scopes = filtered.iter().map(|row| JoinedScope {
+ tables: &joined_tables,
+ rowids: row,
+ });
+ return run_aggregation_pipeline(scopes, &query, &proj_items, &all_items, &having_expr);
+ }
+
// Stage 4: ORDER BY across the joined scope. We pre-compute the
// sort key per row (same approach as `sort_rowids`) so the
// comparator runs on Values, not against the expression tree.
@@ -850,8 +847,13 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result Result {
- // SelectQuery::new already rejects this combination,
- // but defense in depth keeps the pattern match total.
+ // Aggregates are handled by the Stage 3.5 pipeline,
+ // which returns before reaching this projection —
+ // defense in depth keeps the pattern match total.
return Err(SQLRiteError::Internal(
- "aggregate functions over JOIN are not supported".to_string(),
+ "aggregate projection reached the non-aggregating join path".to_string(),
));
}
};
@@ -882,9 +885,64 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result],
+ qualifier: Option<&str>,
+ name: &str,
+) -> Result {
+ if let Some(q) = qualifier {
+ let pos = tables
+ .iter()
+ .position(|t| t.scope_name.eq_ignore_ascii_case(q))
+ .ok_or_else(|| {
+ SQLRiteError::Internal(format!(
+ "unknown table qualifier '{q}' in column reference '{q}.{name}'"
+ ))
+ })?;
+ if !tables[pos].table.contains_column(name.to_string()) {
+ return Err(SQLRiteError::Internal(format!(
+ "column '{name}' does not exist on '{}'",
+ tables[pos].scope_name
+ )));
+ }
+ return Ok(pos);
+ }
+ let mut hit: Option = None;
+ for (i, t) in tables.iter().enumerate() {
+ if t.table.contains_column(name.to_string()) {
+ if hit.is_some() {
+ return Err(SQLRiteError::Internal(format!(
+ "column reference '{name}' is ambiguous — qualify it as .{name}"
+ )));
+ }
+ hit = Some(i);
+ }
+ }
+ hit.ok_or_else(|| {
+ SQLRiteError::Internal(format!(
+ "unknown column '{name}' in joined SELECT (no in-scope table has it)"
+ ))
+ })
+}
+
/// Executes a SELECT and returns `(rendered_table, row_count)`. The
/// REPL and Tauri app use this to keep the table-printing behaviour
/// the engine has always shipped. Structured callers use
@@ -2648,10 +2706,10 @@ fn eval_function(func: &sqlparser::ast::Function, scope: &dyn RowScope) -> Resul
}
// SQLR-3: catch aggregate names used in scalar position (e.g.
// `WHERE COUNT(*) > 1`) with a clearer message than "unknown
- // function". HAVING isn't supported yet, hence the explicit nudge.
+ // function".
"count" | "sum" | "avg" | "min" | "max" => Err(SQLRiteError::NotImplemented(format!(
"aggregate function '{name}' is not allowed in WHERE / projection-scalar position; \
- use it as a top-level projection item (HAVING is not yet supported)"
+ use it as a top-level projection item or in HAVING"
))),
other => Err(SQLRiteError::NotImplemented(format!(
"unknown function: {other}(...)"
@@ -3274,16 +3332,102 @@ fn eval_in_list(scope: &dyn RowScope, lhs: &Expr, list: &[Expr], negated: bool)
// SQLR-3 — Aggregation phase, DISTINCT, post-projection sort
// -----------------------------------------------------------------
-/// Walk `matching` rowids, partition into groups (one synthetic group
+/// SQLR-52 — HAVING lowering, shared by the single-table and joined
+/// aggregation paths. The expression may reference aggregates and
+/// GROUP BY keys that aren't in the SELECT output (SQLite allows
+/// both: `SELECT dept FROM t GROUP BY dept HAVING COUNT(*) > 1`).
+/// We append those as *hidden* trailing projection slots so the
+/// `aggregate_rows` accumulator computes them alongside the visible
+/// ones; the pipeline strips them after filtering. Aggregate calls in
+/// the HAVING tree are lowered to identifiers naming their output slot
+/// (`COUNT(*)` → identifier "COUNT(*)") so the shared expression
+/// evaluator can resolve them through a `GroupRowScope` like any other
+/// column. Returns the widened projection list plus the lowered
+/// HAVING expression (`None` when the query has no HAVING).
+fn lower_having_into_hidden_slots(
+ query: &SelectQuery,
+ proj_items: &[ProjectionItem],
+) -> Result<(Vec, Option)> {
+ let mut all_items = proj_items.to_vec();
+ let having_expr = match &query.having {
+ Some(h) => {
+ for g in &query.group_by {
+ if !all_items
+ .iter()
+ .any(|i| i.output_name().eq_ignore_ascii_case(&g.name))
+ {
+ all_items.push(ProjectionItem {
+ kind: ProjectionKind::Column {
+ qualifier: g.qualifier.clone(),
+ name: g.name.clone(),
+ },
+ alias: None,
+ });
+ }
+ }
+ Some(lower_having_expr(h, &mut all_items)?)
+ }
+ None => None,
+ };
+ Ok((all_items, having_expr))
+}
+
+/// The aggregation tail shared by the single-table and joined SELECT
+/// paths: accumulate groups over the row scopes, apply HAVING, strip
+/// hidden HAVING-only slots, then DISTINCT / ORDER BY / LIMIT on the
+/// output rows. Callers validate column references against their own
+/// scope (table schema vs. joined-table list) before invoking.
+fn run_aggregation_pipeline(
+ scopes: impl IntoIterator- ,
+ query: &SelectQuery,
+ proj_items: &[ProjectionItem],
+ all_items: &[ProjectionItem],
+ having_expr: &Option,
+) -> Result {
+ let columns: Vec = proj_items.iter().map(|i| i.output_name()).collect();
+ let mut rows = aggregate_rows(scopes, &query.group_by, all_items)?;
+
+ if let Some(h) = having_expr {
+ let all_columns: Vec = all_items.iter().map(|i| i.output_name()).collect();
+ rows = filter_groups_by_having(rows, h, &all_columns)?;
+ }
+ // Drop the hidden HAVING-only slots back to the user-visible width.
+ if all_items.len() > proj_items.len() {
+ for row in &mut rows {
+ row.truncate(proj_items.len());
+ }
+ }
+
+ if query.distinct {
+ rows = dedupe_rows(rows);
+ }
+
+ if let Some(order) = &query.order_by {
+ sort_output_rows(&mut rows, &columns, proj_items, order)?;
+ }
+ if let Some(k) = query.limit {
+ rows.truncate(k);
+ }
+
+ Ok(SelectResult { columns, rows })
+}
+
+/// Walk the row scopes, partition into groups (one synthetic group
/// when `group_by` is empty), update one `AggState` per aggregate
/// projection slot per group, then materialize one output row per
/// group in projection order. Group-key columns surface their original
/// `Value` (captured the first time the group was seen); aggregate
/// slots surface `AggState::finalize()`.
-fn aggregate_rows(
- table: &Table,
- matching: &[i64],
- group_by: &[String],
+///
+/// SQLR-6 — generic over [`RowScope`] so the same accumulator serves
+/// the single-table path (a [`SingleTableScope`] per matching rowid)
+/// and the joined path (a [`JoinedScope`] per joined row, where
+/// NULL-padded outer-join sides surface as `Value::Null` — grouped
+/// together like any other NULL, and skipped by `COUNT(col)` per the
+/// usual NULL-skipping aggregate semantics).
+fn aggregate_rows(
+ scopes: impl IntoIterator
- ,
+ group_by: &[GroupByKey],
proj_items: &[ProjectionItem],
) -> Result>> {
// Build the per-projection-slot accumulator template once. Each
@@ -3306,11 +3450,11 @@ fn aggregate_rows(
let mut group_states: Vec>> = Vec::new();
let mut group_key_values: Vec> = Vec::new();
- for &rowid in matching {
+ for scope in scopes {
let mut key_values: Vec = Vec::with_capacity(group_by.len());
let mut key: Vec = Vec::with_capacity(group_by.len());
- for col in group_by {
- let v = table.get_value(col, rowid).unwrap_or(Value::Null);
+ for g in group_by {
+ let v = scope.lookup(g.qualifier.as_deref(), &g.name)?;
key.push(DistinctKey::from_value(&v));
key_values.push(v);
}
@@ -3328,7 +3472,9 @@ fn aggregate_rows(
if let ProjectionKind::Aggregate(call) = &item.kind {
let v = match &call.arg {
AggregateArg::Star => Value::Null,
- AggregateArg::Column(c) => table.get_value(c, rowid).unwrap_or(Value::Null),
+ AggregateArg::Column { qualifier, name } => {
+ scope.lookup(qualifier.as_deref(), name)?
+ }
};
if let Some(state) = group_states[idx][slot].as_mut() {
state.update(&v)?;
@@ -3356,13 +3502,20 @@ fn aggregate_rows(
let mut row: Vec = Vec::with_capacity(proj_items.len());
for (slot, item) in proj_items.iter().enumerate() {
match &item.kind {
- ProjectionKind::Column { name: c, .. } => {
- // The validation in execute_select_rows guarantees
- // bare-column projections are also in `group_by`.
+ ProjectionKind::Column { qualifier, name: c } => {
+ // Parser / executor validation ties bare-column
+ // projections to GROUP BY entries, but `SELECT *`
+ // expansions reach here unvalidated — surface a
+ // clean error rather than panicking.
let pos = group_by
.iter()
- .position(|g| g == c)
- .expect("validated to be in GROUP BY");
+ .position(|g| g.matches_column(qualifier.as_deref(), c))
+ .ok_or_else(|| {
+ SQLRiteError::Internal(format!(
+ "column '{c}' must appear in GROUP BY or be used in an \
+ aggregate function"
+ ))
+ })?;
row.push(group_key_values[group_idx][pos].clone());
}
ProjectionKind::Aggregate(_) => {
@@ -3601,9 +3754,13 @@ fn resolve_order_by_index(
}
// Function form: match by display name against any aggregate item
// whose canonical display equals the user's call. Tolerate case
- // differences in the function name.
+ // differences in the function name. SQLR-6 — a second pass with
+ // `t.` qualifiers stripped from both sides keeps qualifier
+ // spelling differences from blocking the match (`ORDER BY
+ // SUM(amount)` finds a `SELECT SUM(o.amount)` slot and vice
+ // versa), preserving the pre-qualifier behavior.
if let Expr::Function(func) = expr {
- let user_disp = format_function_display(func);
+ let user_disp = format_function_display(func, true);
for (i, item) in proj_items.iter().enumerate() {
if let ProjectionKind::Aggregate(call) = &item.kind
&& call.display_name().eq_ignore_ascii_case(&user_disp)
@@ -3611,6 +3768,16 @@ fn resolve_order_by_index(
return Ok(i);
}
}
+ let user_disp_unqualified = format_function_display(func, false);
+ for (i, item) in proj_items.iter().enumerate() {
+ if let ProjectionKind::Aggregate(call) = &item.kind
+ && call
+ .display_name_unqualified()
+ .eq_ignore_ascii_case(&user_disp_unqualified)
+ {
+ return Ok(i);
+ }
+ }
return Err(SQLRiteError::Internal(format!(
"ORDER BY references aggregate '{user_disp}' that isn't in the SELECT output"
)));
@@ -3623,7 +3790,9 @@ fn resolve_order_by_index(
/// Format a sqlparser function call into the same canonical form
/// `AggregateCall::display_name()` uses, so ORDER BY on
/// `COUNT(*)` / `SUM(salary)` matches its projection counterpart.
-fn format_function_display(func: &sqlparser::ast::Function) -> String {
+/// `qualified` keeps or strips the argument's `t.` qualifier, matching
+/// `display_name()` / `display_name_unqualified()` respectively.
+fn format_function_display(func: &sqlparser::ast::Function, qualified: bool) -> String {
let name = match func.name.0.as_slice() {
[ObjectNamePart::Identifier(ident)] => ident.value.to_uppercase(),
_ => format!("{:?}", func.name).to_uppercase(),
@@ -3638,7 +3807,15 @@ fn format_function_display(func: &sqlparser::ast::Function) -> String {
FunctionArg::Unnamed(FunctionArgExpr::Wildcard) => "*".to_string(),
FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Identifier(i))) => i.value.clone(),
FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::CompoundIdentifier(parts))) => {
- parts.last().map(|p| p.value.clone()).unwrap_or_default()
+ if qualified {
+ parts
+ .iter()
+ .map(|p| p.value.clone())
+ .collect::>()
+ .join(".")
+ } else {
+ parts.last().map(|p| p.value.clone()).unwrap_or_default()
+ }
}
_ => String::new(),
});
@@ -4625,23 +4802,21 @@ mod tests {
}
#[test]
- fn having_over_join_rejected_for_all_flavors() {
- // Aggregates / GROUP BY over JOIN results are a separate increment
- // (SQLR-6); until it lands, every JOIN flavor + GROUP BY + HAVING
- // must surface the clean NotImplemented — no wrong results.
+ fn having_over_join_filters_groups_for_all_flavors() {
+ // SQLR-6 — GROUP BY + HAVING compose with every join flavor.
+ // Only Alice has more than one order; the dangling order (RIGHT /
+ // FULL) groups under a NULL name with count 1 and is filtered.
for flavor in ["INNER", "LEFT OUTER", "RIGHT OUTER", "FULL OUTER"] {
let sql = format!(
"SELECT customers.name, COUNT(*) FROM customers \
{flavor} JOIN orders ON customers.id = orders.customer_id \
GROUP BY customers.name HAVING COUNT(*) > 1;"
);
- let err = crate::sql::process_command(&sql, &mut seed_join_fixture());
- match err {
- Err(SQLRiteError::NotImplemented(msg)) => {
- assert!(msg.contains("JOIN"), "{flavor}: unexpected message: {msg}")
- }
- other => panic!("{flavor}: expected NotImplemented, got {other:?}"),
- }
+ let db = seed_join_fixture();
+ let r = run_rows(&db, &sql);
+ assert_eq!(r.rows.len(), 1, "{flavor}: only Alice has >1 order");
+ assert_eq!(r.rows[0][0].to_display_string(), "Alice", "{flavor}");
+ assert_eq!(expect_int(&r.rows[0][1]), 2, "{flavor}");
}
}
@@ -5151,16 +5326,239 @@ mod tests {
assert!(res.is_err(), "USING (nope) must error — column absent");
}
+ // ---------------------------------------------------------------------
+ // SQLR-6 — aggregates / GROUP BY / DISTINCT over JOIN results
+ // ---------------------------------------------------------------------
+
+ #[test]
+ fn group_by_with_aggregates_over_inner_join() {
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT customers.name, COUNT(*), SUM(orders.amount) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.name ORDER BY customers.name;",
+ );
+ assert_eq!(r.columns, vec!["name", "COUNT(*)", "SUM(orders.amount)"]);
+ assert_eq!(r.rows.len(), 2);
+ assert_eq!(r.rows[0][0].to_display_string(), "Alice");
+ assert_eq!(expect_int(&r.rows[0][1]), 2);
+ assert_eq!(expect_int(&r.rows[0][2]), 300);
+ assert_eq!(r.rows[1][0].to_display_string(), "Bob");
+ assert_eq!(expect_int(&r.rows[1][1]), 1);
+ assert_eq!(expect_int(&r.rows[1][2]), 50);
+ }
+
+ #[test]
+ fn aggregates_over_join_without_group_by() {
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT COUNT(*), SUM(orders.amount) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id;",
+ );
+ assert_eq!(r.rows.len(), 1);
+ assert_eq!(expect_int(&r.rows[0][0]), 3);
+ assert_eq!(expect_int(&r.rows[0][1]), 350);
+ }
+
#[test]
- fn aggregates_over_join_are_rejected() {
+ fn count_column_skips_outer_join_null_padding() {
+ // Carol has no orders: her LEFT-JOIN row is NULL-padded on the
+ // right. COUNT(*) counts the padded row; COUNT(orders.id) skips
+ // its NULL, per the usual NULL-skipping aggregate semantics.
let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT customers.name, COUNT(*), COUNT(orders.id) FROM customers \
+ LEFT OUTER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.name ORDER BY customers.name;",
+ );
+ assert_eq!(r.rows.len(), 3);
+ let carol = &r.rows[2];
+ assert_eq!(carol[0].to_display_string(), "Carol");
+ assert_eq!(expect_int(&carol[1]), 1, "COUNT(*) counts the padded row");
+ assert_eq!(expect_int(&carol[2]), 0, "COUNT(col) skips the NULL");
+ }
+
+ #[test]
+ fn outer_join_null_keys_group_together() {
+ // FULL OUTER surfaces the dangling order (customer_id 4) with a
+ // NULL customers.name — it must form its own group, not vanish.
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT customers.name, COUNT(*) FROM customers \
+ FULL OUTER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.name;",
+ );
+ assert_eq!(r.rows.len(), 4, "Alice, Bob, Carol, NULL");
+ let null_group = r
+ .rows
+ .iter()
+ .find(|row| row[0] == Value::Null)
+ .expect("dangling order groups under NULL");
+ assert_eq!(expect_int(&null_group[1]), 1);
+ }
+
+ #[test]
+ fn count_distinct_over_join() {
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT COUNT(DISTINCT customers.name) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id;",
+ );
+ assert_eq!(expect_int(&r.rows[0][0]), 2);
+ }
+
+ #[test]
+ fn group_by_qualified_key_resolves_ambiguous_name() {
+ // `id` exists on both tables — the qualified GROUP BY key picks
+ // the customers side.
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT customers.id, COUNT(*) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.id ORDER BY customers.id;",
+ );
+ assert_eq!(r.rows.len(), 2);
+ assert_eq!(expect_int(&r.rows[0][0]), 1);
+ assert_eq!(expect_int(&r.rows[0][1]), 2);
+ }
+
+ #[test]
+ fn group_by_ambiguous_unqualified_key_over_join_errors() {
+ let err = crate::sql::process_command(
+ "SELECT COUNT(*) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id GROUP BY id;",
+ &mut seed_join_fixture(),
+ );
+ match err {
+ Err(e) => assert!(
+ e.to_string().contains("ambiguous"),
+ "unexpected message: {e}"
+ ),
+ Ok(_) => panic!("ambiguous GROUP BY key must error"),
+ }
+ }
+
+ #[test]
+ fn bare_column_not_in_group_by_over_join_errors() {
+ let err = crate::sql::process_command(
+ "SELECT orders.amount, COUNT(*) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.name;",
+ &mut seed_join_fixture(),
+ );
+ match err {
+ Err(e) => assert!(
+ e.to_string().contains("must appear in GROUP BY"),
+ "unexpected message: {e}"
+ ),
+ Ok(_) => panic!("bare column outside GROUP BY must error"),
+ }
+ }
+
+ #[test]
+ fn aggregate_in_where_over_join_errors_cleanly() {
+ // Code-review gap from SQLR-5: aggregate misuse inside WHERE on
+ // a joined query must be a typed error, not wrong results.
let err = crate::sql::process_command(
"SELECT COUNT(*) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id \
+ WHERE COUNT(*) > 1;",
+ &mut seed_join_fixture(),
+ );
+ match err {
+ Err(SQLRiteError::NotImplemented(msg)) => assert!(
+ msg.contains("not allowed in WHERE"),
+ "unexpected message: {msg}"
+ ),
+ other => panic!("expected NotImplemented, got {other:?}"),
+ }
+ }
+
+ #[test]
+ fn order_by_aggregate_over_join() {
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT customers.name, SUM(orders.amount) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.name ORDER BY SUM(orders.amount) DESC;",
+ );
+ assert_eq!(r.rows[0][0].to_display_string(), "Alice");
+ // Qualifier-stripped fallback: ORDER BY SUM(amount) finds the
+ // SUM(orders.amount) slot even though the spellings differ.
+ let r2 = run_rows(
+ &db,
+ "SELECT customers.name, SUM(orders.amount) FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id \
+ GROUP BY customers.name ORDER BY SUM(amount) DESC;",
+ );
+ assert_eq!(r2.rows[0][0].to_display_string(), "Alice");
+ }
+
+ #[test]
+ fn distinct_over_join_dedupes_output_rows() {
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT DISTINCT customers.name FROM customers \
INNER JOIN orders ON customers.id = orders.customer_id;",
+ );
+ assert_eq!(r.rows.len(), 2);
+ let names: Vec = r
+ .rows
+ .iter()
+ .map(|row| row[0].to_display_string())
+ .collect();
+ assert_eq!(names, vec!["Alice".to_string(), "Bob".to_string()]);
+ }
+
+ #[test]
+ fn distinct_over_join_defers_limit_past_dedupe() {
+ // Without deferral, LIMIT 2 would truncate the joined rows to
+ // Alice's two orders and dedupe to a single row.
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT DISTINCT customers.name FROM customers \
+ INNER JOIN orders ON customers.id = orders.customer_id LIMIT 2;",
+ );
+ assert_eq!(r.rows.len(), 2, "LIMIT applies after DISTINCT collapses");
+ }
+
+ #[test]
+ fn select_star_group_by_errors_instead_of_panicking() {
+ // Single-table regression: the parser's "must appear in GROUP BY"
+ // check skips `SELECT *`, so the executor used to hit an
+ // `expect()` panic when a non-grouped column reached projection.
+ let err = crate::sql::process_command(
+ "SELECT * FROM orders GROUP BY customer_id;",
&mut seed_join_fixture(),
);
- assert!(err.is_err(), "aggregates over JOIN are not yet supported");
- let _ = db; // keep compiler happy if unused
+ match err {
+ Err(e) => assert!(
+ e.to_string().contains("must appear in GROUP BY"),
+ "unexpected message: {e}"
+ ),
+ Ok(_) => panic!("SELECT * with GROUP BY must error, not panic"),
+ }
+ }
+
+ #[test]
+ fn group_by_qualified_key_single_table_still_works() {
+ // Qualified GROUP BY keys are accepted on the single-table path
+ // too (qualifier ignored, same posture as projections).
+ let db = seed_employees();
+ let r = run_rows(
+ &db,
+ "SELECT dept, COUNT(*) FROM emp GROUP BY emp.dept ORDER BY dept;",
+ );
+ assert_eq!(r.rows.len(), 3, "eng / sales / ops");
}
#[test]
diff --git a/src/sql/parser/select.rs b/src/sql/parser/select.rs
index deabf97..952f48f 100644
--- a/src/sql/parser/select.rs
+++ b/src/sql/parser/select.rs
@@ -39,11 +39,18 @@ impl AggregateFn {
}
}
-/// What the aggregate is fed: `*` (only valid for COUNT) or a bare column.
+/// What the aggregate is fed: `*` (only valid for COUNT) or a column
+/// reference. SQLR-6 — the column carries its optional `t.` qualifier
+/// so joined aggregation (`SUM(orders.amount)`) can disambiguate
+/// same-named columns across in-scope tables; the single-table path
+/// ignores it, same as projection qualifiers.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum AggregateArg {
Star,
- Column(String),
+ Column {
+ qualifier: Option,
+ name: String,
+ },
}
/// A parsed aggregate call like `COUNT(*)`, `SUM(salary)`, `COUNT(DISTINCT dept)`.
@@ -60,13 +67,29 @@ impl AggregateCall {
/// aggregate output columns when the user didn't supply an alias.
/// Mirrors the output-header convention.
pub fn display_name(&self) -> String {
+ self.display_name_impl(true)
+ }
+
+ /// Display form with the argument's `t.` qualifier stripped. Used
+ /// as a fallback when matching ORDER BY function calls against
+ /// projection slots, so `ORDER BY SUM(amount)` still finds a
+ /// `SELECT SUM(o.amount)` slot (and vice versa).
+ pub(crate) fn display_name_unqualified(&self) -> String {
+ self.display_name_impl(false)
+ }
+
+ fn display_name_impl(&self, qualified: bool) -> String {
let inner = match &self.arg {
AggregateArg::Star => "*".to_string(),
- AggregateArg::Column(c) => {
+ AggregateArg::Column { qualifier, name } => {
+ let col = match qualifier {
+ Some(q) if qualified => format!("{q}.{name}"),
+ _ => name.clone(),
+ };
if self.distinct {
- format!("DISTINCT {c}")
+ format!("DISTINCT {col}")
} else {
- c.clone()
+ col
}
}
};
@@ -125,6 +148,32 @@ pub enum Projection {
Items(Vec),
}
+/// SQLR-6 — one GROUP BY key: an optionally-qualified column reference
+/// (`dept` or `t.dept`). The qualifier matters for joined SELECTs,
+/// where the same column name can exist on several in-scope tables;
+/// the single-table path ignores it, mirroring projection qualifiers.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct GroupByKey {
+ pub qualifier: Option,
+ pub name: String,
+}
+
+impl GroupByKey {
+ /// Does a (possibly qualified) column reference name this key?
+ /// Names must match exactly; qualifiers must match (ASCII
+ /// case-insensitively) only when both sides carry one — a bare
+ /// reference matches a qualified key and vice versa. Callers that
+ /// need strict table-resolution equality (the joined executor)
+ /// layer that check on top.
+ pub(crate) fn matches_column(&self, qualifier: Option<&str>, name: &str) -> bool {
+ self.name == name
+ && match (self.qualifier.as_deref(), qualifier) {
+ (Some(kq), Some(q)) => kq.eq_ignore_ascii_case(q),
+ _ => true,
+ }
+ }
+}
+
/// A parsed `ORDER BY` clause: a single sort key (expression), ascending
/// by default. Phase 7b widened this from "bare column name" to
/// "arbitrary expression" so KNN queries of the form
@@ -224,8 +273,9 @@ pub struct SelectQuery {
pub limit: Option,
/// `SELECT DISTINCT`.
pub distinct: bool,
- /// `GROUP BY a, b` — bare column names. Empty = no GROUP BY.
- pub group_by: Vec,
+ /// `GROUP BY a, t.b` — optionally-qualified column references in
+ /// source order. Empty = no GROUP BY.
+ pub group_by: Vec,
/// SQLR-52 — raw sqlparser HAVING expression, evaluated by the
/// executor against each group's output row after aggregation.
/// Parser-level invariant: `Some` implies `group_by` is non-empty
@@ -277,28 +327,44 @@ impl SelectQuery {
));
}
};
- // SQLR-3: parse GROUP BY into a list of bare column names.
+ // SQLR-3: parse GROUP BY into a list of column references.
// GroupByExpr::Expressions(v, _) with an empty v is the "no
// GROUP BY" shape; non-empty means we've got grouping. Reject
- // GROUP BY ALL and GROUP BY on non-bare expressions for v1.
- let group_by_cols: Vec = match group_by {
+ // GROUP BY ALL and GROUP BY on non-column expressions for v1.
+ // SQLR-6 — keys keep their `t.` qualifier so joined grouping
+ // (`GROUP BY customers.name`) can disambiguate.
+ let group_by_cols: Vec = match group_by {
sqlparser::ast::GroupByExpr::Expressions(exprs, _) => {
let mut out = Vec::with_capacity(exprs.len());
for e in exprs {
- let col = match e {
- Expr::Identifier(ident) => ident.value.clone(),
- Expr::CompoundIdentifier(parts) => {
- parts.last().map(|p| p.value.clone()).ok_or_else(|| {
- SQLRiteError::Internal("empty compound identifier".to_string())
- })?
- }
+ let key = match e {
+ Expr::Identifier(ident) => GroupByKey {
+ qualifier: None,
+ name: ident.value.clone(),
+ },
+ Expr::CompoundIdentifier(parts) => match parts.as_slice() {
+ [only] => GroupByKey {
+ qualifier: None,
+ name: only.value.clone(),
+ },
+ [q, c] => GroupByKey {
+ qualifier: Some(q.value.clone()),
+ name: c.value.clone(),
+ },
+ _ => {
+ return Err(SQLRiteError::NotImplemented(format!(
+ "GROUP BY identifier with {} parts is not supported",
+ parts.len()
+ )));
+ }
+ },
other => {
return Err(SQLRiteError::NotImplemented(format!(
"GROUP BY only supports bare column references for now, got {other:?}"
)));
}
};
- out.push(col);
+ out.push(key);
}
out
}
@@ -330,12 +396,19 @@ impl SelectQuery {
// SQLR-3 validation: when GROUP BY is present, every bare-column
// entry in the projection must appear in the GROUP BY list. Bare
// columns in the SELECT are otherwise undefined per group.
- if !group_by_cols.is_empty()
+ // SQLR-6 — only the single-table case validates here; the joined
+ // case needs the table schemas to resolve qualifiers, so the
+ // joined executor performs the equivalent check against the
+ // full in-scope table list.
+ if joins.is_empty()
+ && !group_by_cols.is_empty()
&& let Projection::Items(items) = &projection
{
for item in items {
- if let ProjectionKind::Column { name: c, .. } = &item.kind
- && !group_by_cols.contains(c)
+ if let ProjectionKind::Column { qualifier, name: c } = &item.kind
+ && !group_by_cols
+ .iter()
+ .any(|g| g.matches_column(qualifier.as_deref(), c))
{
return Err(SQLRiteError::Internal(format!(
"column '{c}' must appear in GROUP BY or be used in an aggregate function"
@@ -344,29 +417,6 @@ impl SelectQuery {
}
}
- // SQLR-5 — aggregations across joined results aren't covered
- // by the current single-table grouping pipeline. Reject GROUP
- // BY / aggregates over a join up front so the user gets a clear
- // message rather than wrong results.
- if !joins.is_empty() {
- let has_agg = matches!(
- &projection,
- Projection::Items(items)
- if items.iter().any(|i| matches!(i.kind, ProjectionKind::Aggregate(_)))
- );
- if has_agg || !group_by_cols.is_empty() {
- return Err(SQLRiteError::NotImplemented(
- "GROUP BY / aggregate functions over JOIN results are not supported yet"
- .to_string(),
- ));
- }
- if distinct_flag {
- return Err(SQLRiteError::NotImplemented(
- "SELECT DISTINCT over JOIN results is not supported yet".to_string(),
- ));
- }
- }
-
Ok(SelectQuery {
table_name,
table_alias,
@@ -656,14 +706,28 @@ pub(crate) fn parse_aggregate_call(func: &sqlparser::ast::Function) -> Result AggregateArg::Star,
FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Identifier(ident))) => {
- AggregateArg::Column(ident.value.clone())
+ AggregateArg::Column {
+ qualifier: None,
+ name: ident.value.clone(),
+ }
}
FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::CompoundIdentifier(parts))) => {
- let c = parts
- .last()
- .map(|p| p.value.clone())
- .ok_or_else(|| SQLRiteError::Internal("empty compound identifier".to_string()))?;
- AggregateArg::Column(c)
+ match parts.as_slice() {
+ [only] => AggregateArg::Column {
+ qualifier: None,
+ name: only.value.clone(),
+ },
+ [q, c] => AggregateArg::Column {
+ qualifier: Some(q.value.clone()),
+ name: c.value.clone(),
+ },
+ _ => {
+ return Err(SQLRiteError::NotImplemented(format!(
+ "{name}(...) — argument identifier with {} parts is not supported",
+ parts.len()
+ )));
+ }
+ }
}
other => {
return Err(SQLRiteError::NotImplemented(format!(
diff --git a/web/src/app/docs/page.tsx b/web/src/app/docs/page.tsx
index 29cc2dc..117181c 100644
--- a/web/src/app/docs/page.tsx
+++ b/web/src/app/docs/page.tsx
@@ -315,7 +315,9 @@ export default function DocsPage() {
NATURAL column shows once in SELECT *).
Comma-separated FROMs (FROM a, b) are not — use an
explicit JOIN / CROSS JOIN. Aggregates /{" "}
- GROUP BY over a join lands once subqueries do.
+ GROUP BY / DISTINCT /{" "}
+ HAVING compose over join results, with optionally
+ qualified keys (GROUP BY customers.name).
GROUP BY & aggregates
diff --git a/web/src/components/sql-ref.tsx b/web/src/components/sql-ref.tsx
index 3bf7747..5000392 100644
--- a/web/src/components/sql-ref.tsx
+++ b/web/src/components/sql-ref.tsx
@@ -178,7 +178,6 @@ const NOT_YET = [
"multi-column ORDER BY",
"UNION / INTERSECT / EXCEPT",
"INSERT … SELECT",
- "GROUP BY / DISTINCT over JOINs",
"CREATE VIEW / TRIGGER",
"FOREIGN KEY / CHECK",
"savepoints",