Fix for Issue #232: generative + invariant-driven DSL testing (Phases 1-6)#234
Open
crprashant wants to merge 8 commits into
Open
Fix for Issue #232: generative + invariant-driven DSL testing (Phases 1-6)#234crprashant wants to merge 8 commits into
crprashant wants to merge 8 commits into
Conversation
…ft#232) Introduce a sound post-run snapshot oracle that validates structural invariants of a completed durable function instance directly from df.nodes: - src/invariants.rs: 6 invariants (every_reachable_node_completed, join_all_branches_completed, untaken_if_branch_pending, race_at_least_one_branch_completed, join_branch_result_name_disjoint, query_json_well_formed) with loop-scope relaxations to stay false-positive-free; 26 unit tests. - src/db.rs: canonical NodeSnapshot loader for df.nodes (shared projection). - df.assert_structural_invariants(instance_id, fail_on_violation) SQL function. Packaging/docs/tests: - Bump to 0.2.4; lib.rs func_sig + grant_usage re-emit; sql/pg_durable--0.2.3--0.2.4.sql upgrade script. - USER_GUIDE.md, docs/upgrade-testing.md, docs/dsl-semantics.md (operational-semantics contract). - tests/e2e/sql/24_structural_invariants.sql. Validated: fmt, clippy, 26 unit tests, e2e 24_, upgrade 35/35.
Standalone std-only generator crate (tests/e2e/generated/generator/) that deterministically enumerates DSL combinator nestings, renders each to a live SQL E2E test with marker leaves, and validates it against the Phase 1 structural-invariant oracle plus per-path execution counts. - depth-2 matrix: 154 shapes (128 live / 26 quarantined); 128 passed / 0 failed - classifier splits clean shapes (blocking, sql/) from known loop-bug shapes (quarantine/, xfail, non-blocking) via an empirically-anchored fail set - golden manifest.json + --check determinism guard; per-shape .sql is gitignored and regenerated on demand - CI: generated-matrix determinism+unit job; live matrix folded into the E2E step; nightly non-blocking quarantine run that flags unexpected passes - harness: --include-generated[-quarantine], unix socket in the data dir to keep socket paths short; Makefile generate-matrix target - 32 generator unit tests; paw-sot multi-specialist review applied Surfaced a new product defect (loop inside a JOIN/RACE branch) -> issue microsoft#233.
Phase 4 of the DSL automated-testing roadmap. Adds a metamorphic-relations
registry to the standalone std-only generator crate: pairs of DSL programs the
runtime must treat as equivalent, checked live.
meta.rs introduces a labeled-leaf program model (Leaf/Seq/If/Join/Race/DoWhile/
LoopBreak) where the same logical leaf carries the same label in both programs
of a pair. A tiny deterministic interpreter eval() computes the ground-truth
observable = multiset {label -> completed-count}. The renderer reuses Phase 2's
exact marker/loop/race/break DSL. registry() defines 7 equivalence relations
(seq-assoc, if-true, if-false, join-comm, race-winner, do-while-once,
loop-break-once).
Each generated SQL test runs BOTH programs (tags meta-NNNN-a / meta-NNNN-b in
the shared df_gen_trace) and asserts: both completed; both pass
df.assert_structural_invariants; HEADLINE observable(A) == observable(B) via an
EXCEPT-based multiset symmetric-diff; and a BACKSTOP per-label count plus a
no-unexpected-label check that catches the symmetric-count-bug blind spot the
Phase 1 pure-state invariants miss.
A separate committed golden meta-manifest.json mirrors the Phase 2 manifest.json
determinism contract; the generator's --check guard diffs both. Phase 2's
manifest.json stays byte-identical.
Adds a scoped tests/e2e/generated/.gitattributes pinning the goldens and
generator sources to LF so the Linux-CI --check gate matches a Windows-authored
commit byte-for-byte.
Validation: fmt, build, clippy -D warnings, 42 generator unit tests, and --check
all green. Live metamorphic E2E runs in CI (folded into the existing
generated-matrix and E2E jobs).
Adds a recursive proptest Strategy<Meta> over the labeled-leaf program model,
generating thousands of random combinator trees per run with proptest's
shrinking — the issue's stated value-add over the Phase 2 enumerated matrix.
Properties run in-process against the Phase 4 reference interpreter
(eval/observable) and the SQL renderer (not live PG), because shrinking
requires the property to execute in-process so proptest can drive reduction.
This hardens the same eval the live Phase 4 metamorphic suite depends on, so
it strengthens the live suite transitively.
src/prop.rs (new): 12 shrinking-enabled properties + 4 helper tests
- eval determinism
- differential: eval vs an independent functional reference interpreter
over the whole random tree space (anti-circular; a Phase 5 bridge)
- metamorphic algebra laws on random subtrees (seq-assoc, join-comm,
join-assoc, seq==join multiset, if-branch selection, race->winner) —
generalizing Phase 4's 7 hand cases to thousands
- loop multiplier scaling (DoWhile k / LoopBreak n)
- total-count conservation over structural combinators
- render determinism + structural well-formedness (balanced parens, even
$mk$/$c$ dollar-quotes, no leaked df.start, every label present,
race arity, loop has cond)
proptest is a dev-dependency only, so the generation binary and both goldens
(manifest.json, meta-manifest.json) stay dependency-free and byte-identical.
Persistent failure corpus is proptest-native generator/proptest-regressions/
(committed when populated; LF-normalized via .gitattributes).
CI: clippy gate on the generator; PROPTEST_CASES=256 deterministic replay in
the generated-matrix job; nightly fresh-seed deep run (8192 cases,
non-blocking) with corpus surfacing. Makefile proptest target (1024 cases).
meta.rs: render_prog -> pub(crate) for the renderer property; strip_quoted
test helper loop -> while let (clippy). Rendered output unchanged.
Validated: fmt, clippy --all-targets -D warnings, 58 tests, --check
byte-identical goldens, shrinking demonstrated.
…#232) Phase 5 adds the roadmap's strongest single oracle: a synchronous tree-walking reference interpreter over `Shape` that simulates each program into a pomset (events + happens-before edges). Unlike Phases 2 and 4 -- which differential-test only the COUNT dimension -- this adds the missing causal-ORDER dimension the live wall-clock trace must obey. - refinterp.rs: the interpreter (Seq/If/Loop/Join/Race/LoopBreak per docs/dsl-semantics.md) plus a `counts_match_render` model-level differential against render's closed-form counts, and 14 unit tests including a full depth-2 matrix x K in {1,2,3} count differential. - prop.rs: 6 proptest properties over random `Shape` trees -- the count differential, determinism, forward-edge acyclicity, dense iterations, and structural Seq/Join happens-before laws (asserting the exact edge sets, not just their sizes). - Docs: README Phase 5 section + dsl-semantics.md cross-reference. Test-only: the generation binary does not link the `#[cfg(test)]` interpreter, so manifest.json / meta-manifest.json stay byte-identical. Validated: fmt, clippy -D warnings, 78 tests, generator --check.
Close the Phase 5 loop to the live runtime: emit a causal-order oracle into each clean generated E2E test and pin the happens-before relation as a golden in manifest.json. - emit.rs: per live shape, render a set-based causal-order block — one VALUES row per happens-before (≺) edge from the reference interpreter, a double df_gen_trace self-join to resolve both endpoints, and a RAISE iff any edge ran with earlier.wall_clock >= later.wall_clock. Gated on emit_order = reason.is_none() && !ordered_pairs.is_empty(), with a conditional `ord_viol TEXT` declare. Concurrent (join/race) siblings carry no edge, so they are never compared and the block cannot flake. - main.rs: wire ordered_pairs into build_records via refinterp::interpret(shape, loop_iters).ordered_pairs(). - refinterp.rs: promote off #[cfg(test)] for the binary; only the count-projection helpers (path_counts, counts_match_render) stay test-only. - manifest.json: add an `order` golden (one [earlier, iter, later, iter] entry per ≺ edge) for all 88 edge-bearing shapes, live and quarantined (187896 -> 198914 B; intentionally breaks the prior byte-identical property). - README/dsl-semantics: document the staged scope, the order golden, and a "Clock assumptions (known limitation)" caveat (clock_timestamp is wall-clock, not monotonic; a backward NTP step inside a sub-second test window is the only theoretical false positive, and it fails loudly, never silently). Validated: fmt, clippy -D, generator tests, --check byte-identical (manifest 198914 B, meta-manifest 7730 B); live --include-generated run with the order blocks green. paw-sot review (9 specialists) triaged against source; all must-fixes refuted, net change is the doc-only clock caveat.
…-op (microsoft#232) Self-review follow-ups on the Phase 5 work: - emit.rs: the generated causal-order block now LEFT JOINs both happens-before endpoints to df_gen_trace and treats a missing endpoint (wall_clock IS NULL) as a violation, tagged "(missing endpoint)". Previously an INNER JOIN silently dropped any edge whose endpoint row was absent, so a dropped or duplicated marker iteration -- the very MAX(iteration)+1 numbering under test -- could pass unflagged. The oracle is now fail-closed and strictly stronger; concurrent siblings still carry no edge, so it cannot flake. Lockstep emit test updated. - invariants.rs: document that df.assert_structural_invariants is STRICT, so a NULL argument yields an empty set and never raises. - USER_GUIDE.md: note the same STRICT / non-NULL-boolean caveat in the API reference table. Validated: generator fmt + clippy + 82 unit tests + --check byte-identical; extension fmt + build + clippy + 196 unit tests; live e2e --include-generated 169 passed / 0 failed (128 generated + 7 metamorphic shapes exercise the new block with zero false positives).
…rade (microsoft#232) The pgspot SQL security gate flagged 26 findings on the re-emitted df.grant_usage function in the new upgrade script, even though the byte-identical function passes in the fresh-install SQL. Root cause: pgspot's is_secure_searchpath only trusts a schema named in a function's SET search_path when that schema is CREATEd earlier in the same script. The fresh install runs CREATE SCHEMA df, so its grant_usage re-emit is analyzed with a secure search_path; the upgrade fragment never created df, so the SET search_path = pg_catalog, df, pg_temp was deemed insecure and every unqualified body reference cascaded into findings. Prepend CREATE SCHEMA IF NOT EXISTS df; to the upgrade script. df always exists on any install being upgraded, so this is an idempotent runtime no-op that preserves catalog upgrade-equivalence (test-upgrade.sh Scenario A is a catalog-snapshot diff, unaffected). It makes pgspot recognize the search_path as secure, collapsing all 26 findings into a single PS010 that is already on the gate's allowlist. Gate now passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the full six-phase roadmap from #232 — moving DSL coverage from
hand-written E2E tests to generative + invariant-driven testing. Each phase
is independently shippable; all six are included here (Phase 1 as a documented
slice — see Deferred below).
What's in this PR
docs/dsl-semantics.md— one-page operational-semantics contract for every combinator under nestingdf.assert_structural_invariants(instance_id, fail_on_violation)post-run oracle (6 pure-state invariants) +tests/e2e/sql/24_structural_invariants.sqltests/e2e/generated/combinator-nesting matrix generator — depth-2 exhaustive: 154 shapes (128 live / 26 quarantined) — plus CI wiringproptest::Strategyrandom-tree generator with shrinking (model-level, over the reference interpreter) + failure corpus + nightly jobHighlights
join-all-completed, untaken-if-pending, race-one-completed, join-name-disjoint,
query-json-wellformed). New
src/db.rsnode loader; loop-scope relaxations keepit false-positive-free. Packaged as extension 0.2.4 with upgrade script
sql/pg_durable--0.2.3--0.2.4.sql.nestings → renders live SQL E2E tests with marker leaves → asserts the Phase 1
oracle + per-path counts. A classifier splits clean shapes (blocking) from the
known loop-bug zone (quarantined / xfail, non-blocking).
for every happens-before (≺) edge the reference interpreter predicts, the
generated SQL asserts the earlier marker's
wall_clockprecedes the later one;concurrent siblings carry no edge (no flakiness). The order oracle is
fail-closed — a missing trace endpoint is a violation, never silently dropped.
Validation
cargo fmt/cargo clippy -D warningsclean; 196 extension unit tests +the generator's own unit tests pass.
./scripts/test-e2e-local.sh --include-generated: 169 passed / 0 failed(128 generated + 7 metamorphic shapes exercise the new causal-order block with
zero false positives).
the testing dir.
Deferred (documented, per the issue's acceptance criteria)
The two count-based invariants that would directly catch #227
(
single_execution_outside_loop) and #230 (loop_body_iteration_count_matches)are deferred:
df.nodes.statusis current-state (in-place UPDATE), not anappend-only trace, so iteration/execution counts need a new
df.node_eventslog(or duroxide-history integration). The generated matrix (Phases 2–5) already
catches that same class differentially via the
df_gen_tracemarker tabletoday; the standalone oracle slice ships exactly the invariants that are
checkable from
df.nodesalone.New defect surfaced (bug-as-seed)
Per the issue's "treat every reported bug as a generator seed" recommendation,
the Phase 2 matrix surfaced a new runtime-boundary defect — a
loopnestedinside a
join/racebranch (the inverse of #230; root cause:continue_as_newis host-context-unaware) — filed as #233. A blast-radius experiment found it is
not a cluster DoS (parked hangs ~0 CPU, no pool/connection exhaustion); the
harms are permanently-stuck instances + silent over-execution.
Implements all six phases of #232; maintainers to decide on closing the umbrella
given the documented Phase 1 deferral.