Fix for Issue #232: generative + invariant-driven DSL testing (Phases 1-6) by crprashant · Pull Request #234 · microsoft/pg_durable

crprashant · 2026-06-14T04:31:55Z

Implements the full six-phase roadmap from #232 — moving DSL coverage from
hand-written E2E tests to generative + invariant-driven testing. Each phase
is independently shippable; all six are included here (Phase 1 as a documented
slice — see Deferred below).

What's in this PR

Phase	Deliverable	Status
6	`docs/dsl-semantics.md` — one-page operational-semantics contract for every combinator under nesting	✅
1	`df.assert_structural_invariants(instance_id, fail_on_violation)` post-run oracle (6 pure-state invariants) + `tests/e2e/sql/24_structural_invariants.sql`	✅ (slice)
2	`tests/e2e/generated/` combinator-nesting matrix generator — depth-2 exhaustive: 154 shapes (128 live / 26 quarantined) — plus CI wiring	✅
4	Metamorphic-relations registry (7 issue-#232 equivalences) in the same generator crate	✅
3	`proptest::Strategy` random-tree generator with shrinking (model-level, over the reference interpreter) + failure corpus + nightly job	✅
5	Synchronous tree-walking reference interpreter (pomset / causal-order oracle) + live causal-order assertions emitted into every clean generated test	✅

Highlights

Phase 1 is a sound post-run snapshot checker (reachable-completed,
join-all-completed, untaken-if-pending, race-one-completed, join-name-disjoint,
query-json-wellformed). New src/db.rs node loader; loop-scope relaxations keep
it false-positive-free. Packaged as extension 0.2.4 with upgrade script
sql/pg_durable--0.2.3--0.2.4.sql.
Phase 2 is a standalone std-only crate (no pgrx/duroxide dep) that enumerates
nestings → renders live SQL E2E tests with marker leaves → asserts the Phase 1
oracle + per-path counts. A classifier splits clean shapes (blocking) from the
known loop-bug zone (quarantined / xfail, non-blocking).
Phase 5 adds the causal-order dimension the count-based oracles can't see:
for every happens-before (≺) edge the reference interpreter predicts, the
generated SQL asserts the earlier marker's wall_clock precedes the later one;
concurrent siblings carry no edge (no flakiness). The order oracle is
fail-closed — a missing trace endpoint is a violation, never silently dropped.

Validation

cargo fmt / cargo clippy -D warnings clean; 196 extension unit tests +
the generator's own unit tests pass.
Live ./scripts/test-e2e-local.sh --include-generated: 169 passed / 0 failed
(128 generated + 7 metamorphic shapes exercise the new causal-order block with
zero false positives).
Each phase was put through a multi-specialist review; triage records live under
the testing dir.

Deferred (documented, per the issue's acceptance criteria)

The two count-based invariants that would directly catch #227
(single_execution_outside_loop) and #230 (loop_body_iteration_count_matches)
are deferred: df.nodes.status is current-state (in-place UPDATE), not an
append-only trace, so iteration/execution counts need a new df.node_events log
(or duroxide-history integration). The generated matrix (Phases 2–5) already
catches that same class differentially via the df_gen_trace marker table
today; the standalone oracle slice ships exactly the invariants that are
checkable from df.nodes alone.

New defect surfaced (bug-as-seed)

Per the issue's "treat every reported bug as a generator seed" recommendation,
the Phase 2 matrix surfaced a new runtime-boundary defect — a loop nested
inside a join/race branch (the inverse of #230; root cause: continue_as_new
is host-context-unaware) — filed as #233. A blast-radius experiment found it is
not a cluster DoS (parked hangs ~0 CPU, no pool/connection exhaustion); the
harms are permanently-stuck instances + silent over-execution.

Implements all six phases of #232; maintainers to decide on closing the umbrella
given the documented Phase 1 deferral.

…ft#232) Introduce a sound post-run snapshot oracle that validates structural invariants of a completed durable function instance directly from df.nodes: - src/invariants.rs: 6 invariants (every_reachable_node_completed, join_all_branches_completed, untaken_if_branch_pending, race_at_least_one_branch_completed, join_branch_result_name_disjoint, query_json_well_formed) with loop-scope relaxations to stay false-positive-free; 26 unit tests. - src/db.rs: canonical NodeSnapshot loader for df.nodes (shared projection). - df.assert_structural_invariants(instance_id, fail_on_violation) SQL function. Packaging/docs/tests: - Bump to 0.2.4; lib.rs func_sig + grant_usage re-emit; sql/pg_durable--0.2.3--0.2.4.sql upgrade script. - USER_GUIDE.md, docs/upgrade-testing.md, docs/dsl-semantics.md (operational-semantics contract). - tests/e2e/sql/24_structural_invariants.sql. Validated: fmt, clippy, 26 unit tests, e2e 24_, upgrade 35/35.

Standalone std-only generator crate (tests/e2e/generated/generator/) that deterministically enumerates DSL combinator nestings, renders each to a live SQL E2E test with marker leaves, and validates it against the Phase 1 structural-invariant oracle plus per-path execution counts. - depth-2 matrix: 154 shapes (128 live / 26 quarantined); 128 passed / 0 failed - classifier splits clean shapes (blocking, sql/) from known loop-bug shapes (quarantine/, xfail, non-blocking) via an empirically-anchored fail set - golden manifest.json + --check determinism guard; per-shape .sql is gitignored and regenerated on demand - CI: generated-matrix determinism+unit job; live matrix folded into the E2E step; nightly non-blocking quarantine run that flags unexpected passes - harness: --include-generated[-quarantine], unix socket in the data dir to keep socket paths short; Makefile generate-matrix target - 32 generator unit tests; paw-sot multi-specialist review applied Surfaced a new product defect (loop inside a JOIN/RACE branch) -> issue microsoft#233.

Phase 4 of the DSL automated-testing roadmap. Adds a metamorphic-relations registry to the standalone std-only generator crate: pairs of DSL programs the runtime must treat as equivalent, checked live. meta.rs introduces a labeled-leaf program model (Leaf/Seq/If/Join/Race/DoWhile/ LoopBreak) where the same logical leaf carries the same label in both programs of a pair. A tiny deterministic interpreter eval() computes the ground-truth observable = multiset {label -> completed-count}. The renderer reuses Phase 2's exact marker/loop/race/break DSL. registry() defines 7 equivalence relations (seq-assoc, if-true, if-false, join-comm, race-winner, do-while-once, loop-break-once). Each generated SQL test runs BOTH programs (tags meta-NNNN-a / meta-NNNN-b in the shared df_gen_trace) and asserts: both completed; both pass df.assert_structural_invariants; HEADLINE observable(A) == observable(B) via an EXCEPT-based multiset symmetric-diff; and a BACKSTOP per-label count plus a no-unexpected-label check that catches the symmetric-count-bug blind spot the Phase 1 pure-state invariants miss. A separate committed golden meta-manifest.json mirrors the Phase 2 manifest.json determinism contract; the generator's --check guard diffs both. Phase 2's manifest.json stays byte-identical. Adds a scoped tests/e2e/generated/.gitattributes pinning the goldens and generator sources to LF so the Linux-CI --check gate matches a Windows-authored commit byte-for-byte. Validation: fmt, build, clippy -D warnings, 42 generator unit tests, and --check all green. Live metamorphic E2E runs in CI (folded into the existing generated-matrix and E2E jobs).

Adds a recursive proptest Strategy<Meta> over the labeled-leaf program model, generating thousands of random combinator trees per run with proptest's shrinking — the issue's stated value-add over the Phase 2 enumerated matrix. Properties run in-process against the Phase 4 reference interpreter (eval/observable) and the SQL renderer (not live PG), because shrinking requires the property to execute in-process so proptest can drive reduction. This hardens the same eval the live Phase 4 metamorphic suite depends on, so it strengthens the live suite transitively. src/prop.rs (new): 12 shrinking-enabled properties + 4 helper tests - eval determinism - differential: eval vs an independent functional reference interpreter over the whole random tree space (anti-circular; a Phase 5 bridge) - metamorphic algebra laws on random subtrees (seq-assoc, join-comm, join-assoc, seq==join multiset, if-branch selection, race->winner) — generalizing Phase 4's 7 hand cases to thousands - loop multiplier scaling (DoWhile k / LoopBreak n) - total-count conservation over structural combinators - render determinism + structural well-formedness (balanced parens, even $mk$/$c$ dollar-quotes, no leaked df.start, every label present, race arity, loop has cond) proptest is a dev-dependency only, so the generation binary and both goldens (manifest.json, meta-manifest.json) stay dependency-free and byte-identical. Persistent failure corpus is proptest-native generator/proptest-regressions/ (committed when populated; LF-normalized via .gitattributes). CI: clippy gate on the generator; PROPTEST_CASES=256 deterministic replay in the generated-matrix job; nightly fresh-seed deep run (8192 cases, non-blocking) with corpus surfacing. Makefile proptest target (1024 cases). meta.rs: render_prog -> pub(crate) for the renderer property; strip_quoted test helper loop -> while let (clippy). Rendered output unchanged. Validated: fmt, clippy --all-targets -D warnings, 58 tests, --check byte-identical goldens, shrinking demonstrated.

…#232) Phase 5 adds the roadmap's strongest single oracle: a synchronous tree-walking reference interpreter over `Shape` that simulates each program into a pomset (events + happens-before edges). Unlike Phases 2 and 4 -- which differential-test only the COUNT dimension -- this adds the missing causal-ORDER dimension the live wall-clock trace must obey. - refinterp.rs: the interpreter (Seq/If/Loop/Join/Race/LoopBreak per docs/dsl-semantics.md) plus a `counts_match_render` model-level differential against render's closed-form counts, and 14 unit tests including a full depth-2 matrix x K in {1,2,3} count differential. - prop.rs: 6 proptest properties over random `Shape` trees -- the count differential, determinism, forward-edge acyclicity, dense iterations, and structural Seq/Join happens-before laws (asserting the exact edge sets, not just their sizes). - Docs: README Phase 5 section + dsl-semantics.md cross-reference. Test-only: the generation binary does not link the `#[cfg(test)]` interpreter, so manifest.json / meta-manifest.json stay byte-identical. Validated: fmt, clippy -D warnings, 78 tests, generator --check.

Close the Phase 5 loop to the live runtime: emit a causal-order oracle into each clean generated E2E test and pin the happens-before relation as a golden in manifest.json. - emit.rs: per live shape, render a set-based causal-order block — one VALUES row per happens-before (≺) edge from the reference interpreter, a double df_gen_trace self-join to resolve both endpoints, and a RAISE iff any edge ran with earlier.wall_clock >= later.wall_clock. Gated on emit_order = reason.is_none() && !ordered_pairs.is_empty(), with a conditional `ord_viol TEXT` declare. Concurrent (join/race) siblings carry no edge, so they are never compared and the block cannot flake. - main.rs: wire ordered_pairs into build_records via refinterp::interpret(shape, loop_iters).ordered_pairs(). - refinterp.rs: promote off #[cfg(test)] for the binary; only the count-projection helpers (path_counts, counts_match_render) stay test-only. - manifest.json: add an `order` golden (one [earlier, iter, later, iter] entry per ≺ edge) for all 88 edge-bearing shapes, live and quarantined (187896 -> 198914 B; intentionally breaks the prior byte-identical property). - README/dsl-semantics: document the staged scope, the order golden, and a "Clock assumptions (known limitation)" caveat (clock_timestamp is wall-clock, not monotonic; a backward NTP step inside a sub-second test window is the only theoretical false positive, and it fails loudly, never silently). Validated: fmt, clippy -D, generator tests, --check byte-identical (manifest 198914 B, meta-manifest 7730 B); live --include-generated run with the order blocks green. paw-sot review (9 specialists) triaged against source; all must-fixes refuted, net change is the doc-only clock caveat.

…-op (microsoft#232) Self-review follow-ups on the Phase 5 work: - emit.rs: the generated causal-order block now LEFT JOINs both happens-before endpoints to df_gen_trace and treats a missing endpoint (wall_clock IS NULL) as a violation, tagged "(missing endpoint)". Previously an INNER JOIN silently dropped any edge whose endpoint row was absent, so a dropped or duplicated marker iteration -- the very MAX(iteration)+1 numbering under test -- could pass unflagged. The oracle is now fail-closed and strictly stronger; concurrent siblings still carry no edge, so it cannot flake. Lockstep emit test updated. - invariants.rs: document that df.assert_structural_invariants is STRICT, so a NULL argument yields an empty set and never raises. - USER_GUIDE.md: note the same STRICT / non-NULL-boolean caveat in the API reference table. Validated: generator fmt + clippy + 82 unit tests + --check byte-identical; extension fmt + build + clippy + 196 unit tests; live e2e --include-generated 169 passed / 0 failed (128 generated + 7 metamorphic shapes exercise the new block with zero false positives).

…rade (microsoft#232) The pgspot SQL security gate flagged 26 findings on the re-emitted df.grant_usage function in the new upgrade script, even though the byte-identical function passes in the fresh-install SQL. Root cause: pgspot's is_secure_searchpath only trusts a schema named in a function's SET search_path when that schema is CREATEd earlier in the same script. The fresh install runs CREATE SCHEMA df, so its grant_usage re-emit is analyzed with a secure search_path; the upgrade fragment never created df, so the SET search_path = pg_catalog, df, pg_temp was deemed insecure and every unqualified body reference cascaded into findings. Prepend CREATE SCHEMA IF NOT EXISTS df; to the upgrade script. df always exists on any install being upgraded, so this is an idempotent runtime no-op that preserves catalog upgrade-equivalence (test-upgrade.sh Scenario A is a catalog-snapshot diff, unaffected). It makes pgspot recognize the search_path as secure, collapsing all 26 findings into a single PS010 that is already on the gate's allowlist. Gate now passes.

crprashant added 7 commits June 12, 2026 21:36

crprashant mentioned this pull request Jun 14, 2026

Roadmap: thorough automated testing for the durable-function DSL #232

Open

crprashant changed the title ~~Issue #232: generative + invariant-driven DSL testing (Phases 1-6)~~ Fix for Issue #232: generative + invariant-driven DSL testing (Phases 1-6) Jun 15, 2026

pinodeca linked an issue Jun 16, 2026 that may be closed by this pull request

Roadmap: thorough automated testing for the durable-function DSL #232

Open

pinodeca mentioned this pull request Jun 17, 2026

Release v0.2.3 #243

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Issue #232: generative + invariant-driven DSL testing (Phases 1-6)#234

Fix for Issue #232: generative + invariant-driven DSL testing (Phases 1-6)#234
crprashant wants to merge 8 commits into
microsoft:mainfrom
crprashant:crprashant/dsl-testing-roadmap

crprashant commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crprashant commented Jun 14, 2026

What's in this PR

Highlights

Validation

Deferred (documented, per the issue's acceptance criteria)

New defect surfaced (bug-as-seed)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant