Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ sqlrite> DELETE FROM users WHERE age < 30;
| `CREATE TABLE` | `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; `IF NOT EXISTS` (idempotent re-create); duplicate-column detection; types `INTEGER`/`INT`/`BIGINT`/`SMALLINT`, `TEXT`/`VARCHAR`, `REAL`/`FLOAT`/`DOUBLE`/`DECIMAL`, `BOOLEAN`. Auto-creates `sqlrite_autoindex_<table>_<col>` for every PK + UNIQUE column |
| `CREATE [UNIQUE] INDEX` | Single-column, named indexes; `IF NOT EXISTS`; persists as a dedicated cell-based B-Tree. INTEGER + TEXT columns only |
| `INSERT INTO` | Explicit column list required; auto-ROWID for `INTEGER PRIMARY KEY`; multi-row `VALUES (…), (…)`; UNIQUE enforcement; clean type errors (no panics); NULL padding for omitted columns |
| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` |
| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, t.col …]` (qualified keys allowed); `HAVING`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references — aggregates / `GROUP BY` / `DISTINCT` / `HAVING` all compose over join results; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` |
| `UPDATE` | Multi-column `SET`; `WHERE`; UNIQUE + type enforcement; arithmetic in assignments (`SET age = age + 1`) |
| `DELETE` | `WHERE` predicate or full-table delete |
| `BEGIN` / `COMMIT` / `ROLLBACK` | Real transactions, snapshot-based; WAL-backed commit; single-level (no savepoints); auto-rollback if `COMMIT`'s disk write fails |
Expand All @@ -193,7 +193,7 @@ Expressions in `WHERE` and `UPDATE`'s `SET` RHS:
- String concat — `||`
- Literals — integer + real numbers, `'single-quoted strings'`, `TRUE` / `FALSE`, `NULL`; parentheses for grouping

**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE '<char>'`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`), aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference.
**Not yet supported** (common ones): subqueries, CTEs, `HAVING` without `GROUP BY`, `LIKE … ESCAPE '<char>'`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`). The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference.

#### Meta commands

Expand Down
1 change: 0 additions & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,6 @@ Steps 1–7 are purely in-memory; step 8 is the only disk contact, and after the
The roadmap has shipped far enough that the original "deliberately missing" list mostly turned into shipped features. What's still left:

- **No query optimizer** beyond the bounded-heap top-k pass for KNN (Phase 7c) and the HNSW probe shortcut (7d.2). Equality-on-PK probes are direct; everything else is a table scan. Joins use plain nested-loop (O(N×M) per join level); hash / merge joins on equi-join shapes are a future increment.
- **Aggregates / GROUP BY / DISTINCT over joined results.** The single-table aggregator is wired against one rowid stream; the multi-table join executor produces joined rows but doesn't yet feed them through the aggregator. Surfaces as a clean `NotImplemented` at parse time. The single-table aggregation path (SQLR-3) is fully shipped.
- **No network layer.** SQLRite is embedded-only. The closest thing is the [`sqlrite-mcp`](mcp.md) server, which is stdio (not network). A real wire protocol isn't on the roadmap.
- **No streaming row cursor.** `Rows` is currently backed by an eager `Vec` (Phase 5a). The `Rows::next` API is shaped to support a real cursor — the swap is deferred to **5a.2**.

Expand Down
4 changes: 2 additions & 2 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -561,7 +561,7 @@ The biggest single SQL-surface jump in the project's history.
- Self-joins require an alias on at least one side.
- `WHERE` runs after joins (the standard `LEFT JOIN ... WHERE right.col IS NULL` anti-join idiom works).

`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported. Not yet supported: comma-separated FROMs (`FROM a, b`), aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization.
`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported, and aggregates / `GROUP BY` / `DISTINCT` / `HAVING` compose over join results (SQLR-6). Not yet supported: comma-separated FROMs (`FROM a, b`), `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization.

### ✅ Phase 9g — Prepared statements + parameter binding *(v0.9.0, SQLR-23)*

Expand Down Expand Up @@ -736,7 +736,7 @@ The remaining items — actually open, not retroactively rewritten:
- Subqueries (scalar, `IN (SELECT ...)`, correlated) and CTEs (`WITH`, recursive)
- ~~`HAVING` (post-aggregation filter)~~ ✅ Shipped (SQLR-52) — group-row filter after aggregation; references GROUP BY keys, aggregate aliases, and direct aggregate calls (hidden-slot computation for HAVING-only aggregates). `HAVING` without `GROUP BY` stays rejected in v0.
- `CASE WHEN … THEN … END`, `BETWEEN`, `GLOB`, `REGEXP`, `LIKE … ESCAPE '<char>'`
- Aggregates / `GROUP BY` / `DISTINCT` *over* joins (needs a single executor pass that knows about multiple input streams)
- ~~Aggregates / `GROUP BY` / `DISTINCT` *over* joins~~ ✅ Shipped (SQLR-6) — the joined row stream feeds the same scope-generic aggregation pipeline the single-table path uses; `GROUP BY` keys accept `t.col` qualifiers; `HAVING` and `SELECT DISTINCT` compose too.
- Multi-column / expression `ORDER BY`, `OFFSET`, `NULLS FIRST/LAST`
- `UNION` / `INTERSECT` / `EXCEPT`, `INSERT ... SELECT`
- Composite + expression indexes
Expand Down
12 changes: 10 additions & 2 deletions docs/sql-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ The `sqlparser` AST is designed to cover every SQL dialect, so its types are hug

`UPDATE` and `DELETE` don't have a dedicated internal struct — the executor pattern-matches the sqlparser types directly because there's less transformation needed.

`SelectQuery::projection` is now `Projection::All | Projection::Items(Vec<ProjectionItem>)`, where each item carries a `ProjectionKind::Column { qualifier, name }` (qualifier is `Some` for `t.col` shapes, used by JOIN execution to disambiguate) or `ProjectionKind::Aggregate(AggregateCall)` plus an optional `AS alias`. `AggregateCall` covers `COUNT(*)`, `COUNT([DISTINCT] col)`, `SUM` / `AVG` / `MIN` / `MAX` of a bare column. `group_by` is a `Vec<String>` of bare column names (empty = no GROUP BY); the parser validates that every non-aggregate projection item appears in `GROUP BY`.
`SelectQuery::projection` is now `Projection::All | Projection::Items(Vec<ProjectionItem>)`, where each item carries a `ProjectionKind::Column { qualifier, name }` (qualifier is `Some` for `t.col` shapes, used by JOIN execution to disambiguate) or `ProjectionKind::Aggregate(AggregateCall)` plus an optional `AS alias`. `AggregateCall` covers `COUNT(*)`, `COUNT([DISTINCT] col)`, `SUM` / `AVG` / `MIN` / `MAX` of a column reference (optionally qualified, `SUM(o.amount)`). `group_by` is a `Vec<GroupByKey>` of optionally-qualified column references (`GROUP BY dept`, `GROUP BY customers.name`; empty = no GROUP BY). The parser validates that every non-aggregate projection item appears in `GROUP BY` for single-table queries; joined queries defer that check to the executor, which resolves qualifiers against the in-scope table schemas (SQLR-6).

`SelectQuery::joins` (SQLR-5) is a `Vec<JoinClause>` evaluated left-to-right by `execute_select_rows_joined`. Each clause carries a `JoinType` (`Inner` / `LeftOuter` / `RightOuter` / `FullOuter`), the right-table name + optional alias, and a required `ON` expression. Empty = single-table SELECT, the existing fast path with HNSW / FTS / bounded-heap optimizations.

Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE '<char>'`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).)
Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), `HAVING` without `GROUP BY`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE '<char>'`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).)

## Statement dispatch

Expand Down Expand Up @@ -176,6 +176,14 @@ contributes to its group), so the executor takes a separate path:
against the *output* row by alias, bare column name, or aggregate
display form), then LIMIT.

SQLR-6 made steps 2–6 scope-generic: the accumulator consumes
`RowScope`s instead of `(table, rowid)` pairs, so a joined SELECT feeds
its fully-joined, WHERE-filtered rows (each one a `JoinedScope`)
through the exact same pipeline. `GROUP BY` keys and aggregate args
carry an optional `t.` qualifier for disambiguation; NULL-padded
outer-join rows group under a `NULL` key and are skipped by
`COUNT(col)` like any other NULL.

Aggregate function names (`COUNT`/`SUM`/`AVG`/`MIN`/`MAX`) used in WHERE
or any other scalar position get a friendly error redirecting the user
to the projection list (`HAVING` is where post-aggregate filters go).
Expand Down
8 changes: 3 additions & 5 deletions docs/supported-sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ COUNT([DISTINCT] <column>) -- counts non-NULL values, option
- **Projection**: `*` (all columns in declaration order), a bare column list, or an explicit list mixing bare columns and aggregate calls. Each item can carry an optional `AS alias` (the alias becomes the output column header and is recognized by `ORDER BY`).
- **`WHERE`**: any [expression](#expressions). Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). Includes **`IS NULL`** / **`IS NOT NULL`** for explicit null tests, **`LIKE` / `NOT LIKE` / `ILIKE`** for pattern matching, and **`IN (list) / NOT IN (list)`** for set-membership against literal lists.
- **`DISTINCT`**: `SELECT DISTINCT` deduplicates result rows after projection (and after aggregation, when both apply). `NULL` values compare equal to other `NULL`s for dedupe, matching SQL's DISTINCT semantic.
- **`GROUP BY`**: one or more bare column names. Every non-aggregate item in the projection must appear in the `GROUP BY` list (the parser rejects the violation with a clear message). `GROUP BY <col>` without any aggregate behaves like an implicit `DISTINCT <col>`.
- **`GROUP BY`**: one or more column names, optionally qualified (`GROUP BY customers.name`) — the qualifier disambiguates same-named columns across joined tables (SQLR-6). Every non-aggregate item in the projection must appear in the `GROUP BY` list (rejected with a clear message — by the parser for single-table queries, by the executor for joined ones, where resolving qualifiers needs the schemas). `GROUP BY <col>` without any aggregate behaves like an implicit `DISTINCT <col>`.
- **`HAVING`** (SQLR-52): post-aggregation filter over the grouped output. `WHERE` filters rows before grouping; `HAVING` filters groups after aggregation. Requires `GROUP BY` (see [HAVING semantics](#having-semantics-sqlr-52)).
- **Aggregates** (SQLR-3): `COUNT(*)`, `COUNT(col)`, `COUNT(DISTINCT col)`, `SUM(col)`, `AVG(col)`, `MIN(col)`, `MAX(col)`. `SUM` over an integer column stays `INTEGER` until a `REAL` input arrives or the running sum overflows `i64` (one-time promotion to `REAL`). `AVG` always returns `REAL` (or `NULL` on empty / all-NULL groups). `MIN` / `MAX` skip NULLs and use the same total order as `ORDER BY`. Aggregates over an empty table or empty group return `0` for `COUNT(*)` / `COUNT(col)` and `NULL` for the rest.
- **`ORDER BY`**: single sort key, `ASC` (default) or `DESC`. For non-aggregating queries the key is any expression — including function calls — so KNN queries like `ORDER BY vec_distance_l2(embedding, [...]) LIMIT k` work end-to-end *(Phase 7b)*. For aggregating queries the key resolves against the *output* row by name: a bare identifier matches an alias or a `GROUP BY` column, and a function call like `COUNT(*)` matches an aggregate projection by its canonical display form. Sort key types must match across rows.
Expand Down Expand Up @@ -237,12 +237,12 @@ conditions, plus `CROSS JOIN`:
- **Self-joins** require an alias on at least one side: `FROM nodes AS p INNER JOIN nodes AS c ON p.id = c.parent_id`. Without one, you get a `duplicate table reference` error so qualifiers stay unambiguous.
- **`WHERE` runs after joins.** A `WHERE right.col IS NULL` filter on a `LEFT JOIN` correctly returns left rows with no match (the standard "anti-join via outer-join" idiom).
- **`ORDER BY` and `LIMIT`** apply to the fully joined row stream.
- **Aggregates / `GROUP BY` / `DISTINCT` / `HAVING` over joins** (SQLR-6): the fully-joined, `WHERE`-filtered row stream feeds the same aggregation pipeline single-table queries use. `GROUP BY` keys may be qualified (`GROUP BY customers.name`) and must resolve unambiguously; NULL-padded outer-join rows group under a `NULL` key, and `COUNT(col)` skips their NULLs while `COUNT(*)` counts them. `SELECT DISTINCT` dedupes the projected join output (with `LIMIT` applied after the dedupe).
- **Algorithm:** plain nested-loop join, O(N×M) per join level. Adequate for an embedded learning database; hash / merge joins on equi-join shapes are a future optimization.

#### What's not supported in JOINs

- Comma-separated FROM lists (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN` instead.
- Aggregates / `GROUP BY` / `DISTINCT` *over* a join. The single-table aggregator is wired against one rowid stream; rewiring it for joined rows is a separate increment. Surfaces as a clean `NotImplemented` at parse time.
- `fts_match` / `bm25_score` inside a JOIN expression. They need to look up an FTS index by column, which is single-table-bound today. Use them on a single-table SELECT first, or fold the FTS lookup into the FROM side.

### Index probing
Expand Down Expand Up @@ -281,7 +281,6 @@ SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1 AND SUM(salary) > 100;
### What doesn't work

- **Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5))
- **Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land
- **Subqueries**, CTEs (`WITH`), views
- **`HAVING` without `GROUP BY`** — the degenerate single-group form is rejected; `HAVING` with `GROUP BY` works (see [HAVING semantics](#having-semantics-sqlr-52))
- **`DISTINCT`** on `SUM` / `AVG` / `MIN` / `MAX` (only `COUNT(DISTINCT col)` is supported)
Expand Down Expand Up @@ -725,8 +724,7 @@ A REPL launched with `sqlrite --readonly foo.sqlrite` (or `sqlrite::open_databas
For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when these land:

### Joins & composition
- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN`
- Aggregates / `GROUP BY` / `DISTINCT` *over* a JOIN — pipe through a subquery once subqueries land
- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5), and aggregates / `GROUP BY` / `DISTINCT` / `HAVING` compose over join results (SQLR-6). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN`
- `fts_match` / `bm25_score` inside a JOIN expression — single-table-bound today
- Subqueries (scalar, `IN (SELECT ...)`, correlated)
- CTEs (`WITH`), recursive CTEs
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Quick hits worth knowing when you're working at the REPL:
- **Arithmetic stays honest.** Integer-only operations stay integer; any `REAL` operand promotes to `f64`; divide-by-zero is a typed runtime error, never a panic.
- **NULL follows three-valued logic.** `NULL = NULL` is unknown (not true) — treated as false in `WHERE`. Use `IS NULL` / `IS NOT NULL` for explicit null tests, e.g. `SELECT id FROM t WHERE qty IS NULL;`.
- **Identifiers are case-sensitive** (table / column names; no normalization), but keywords aren't. String literals preserve case.
- **Not yet supported**: joins, subqueries, `GROUP BY` / aggregates, `DISTINCT`, `LIKE` / `IN`, projection expressions, column aliases, `OFFSET`, multi-column `ORDER BY`, savepoints, `ALTER TABLE`, `DROP TABLE`, `DROP INDEX`. See the [full list in the reference](supported-sql.md#not-yet-supported).
- **Not yet supported**: subqueries, CTEs, views, comma joins (`FROM a, b`), projection expressions beyond aggregate calls, `OFFSET`, multi-column `ORDER BY`, savepoints. See the [full list in the reference](supported-sql.md#not-yet-supported).

## History

Expand Down
Loading
Loading