tests/interaction: era-axis machinery for the requirements manifest#2909
Conversation
Adds the type machinery for parametrizing the interaction suite over (transport, spec_version) cells, with no behaviour change yet: - SpecVersion / SPEC_VERSIONS / CONNECTABLE_TRANSPORTS / TRANSPORT_SPEC_VERSIONS constants; SPEC_REVISION now derived from SPEC_VERSIONS[-1] - ArmExclusionReason literal + ArmExclusion / KnownFailure dataclasses - Requirement gains note / added_in / removed_in / supersedes / superseded_by / arm_exclusions / known_failures (all defaulted, no manifest entry edited) - Connect protocol + the three connect_* factories accept a protocol_version kwarg (currently ignored) 534 tests collected (unchanged); pyright/ruff clean.
…_ self-tests compute_cells() expands a test's stacked @requirement marks into the (transport, spec_version) pytest.param list with intersection semantics: a cell is dropped if any requirement's [added_in, removed_in) window or arm_exclusions excludes it, and marked xfail-strict if any requirement's known_failures matches it. Requirement.transports is intentionally not consulted (descriptive metadata only). cell_id() keeps node-id suffixes as bare transport names while SPEC_VERSIONS has a single entry. test_coverage.py gains pytest.raises self-tests for every new __post_init__ validation branch on ArmExclusion / KnownFailure / Requirement. 12 passed; pyright/ruff clean; conftest not yet wired.
pytest_generate_tests now reads each connect-taking test's stacked @requirement marks, looks them up in REQUIREMENTS, and parametrizes the connect fixture with compute_cells() output. The fixture body unpacks the (transport, spec_version) tuple and returns the bare factory. 411 connect-parametrized cells with byte-identical [transport] node ids; no behaviour change while SPEC_VERSIONS has a single entry. test_coverage.py gains 10 unit tests for compute_cells/cell_id covering intersection semantics, wildcard arm-exclusion / known-failure matching, TRANSPORT_SPEC_VERSIONS gating, and the transports-field-is-metadata-only guarantee. _requirements.py is at 100% line+branch coverage from test_coverage.py alone.
test_coverage.py gains five manifest-level invariants (vacuously green until entries start using the new fields): - SPEC_VERSIONS subset of KNOWN_PROTOCOL_VERSIONS and includes LATEST - CONNECTABLE_TRANSPORTS == conftest._FACTORIES keys - supersedes / superseded_by links are bidirectional and versioned - every arm_exclusion / known_failure targets a reachable cell README.md: the requirements-manifest field list documents the new fields, and the spec-revision section is rewritten as 'Spec versions and the era axis' covering compute_cells() intersection semantics, the transports-is-metadata-only rule, and the checklist for landing a new revision. Full gate: 2299 passed, 100.00% coverage, pyright/ruff clean, 411 connect cells with unchanged ids.
0c09063 to
0a5be73
Compare
The 2025-era unofficial stateless mode (fresh transport per request, no session id, no standalone GET stream) is now a fourth connectable transport alongside in-memory, sse, and streamable-http. The factory is a partial of connect_over_streamable_http with stateless_http=True; the same shared Server instance backs every request, so no server-factory widening is needed. 41 requirements that structurally cannot run on the stateless arm are annotated with arm_exclusions scoped to this transport: 29 because the server cannot issue requests back to the client (sampling, elicitation, roots, server-to-client ping/cancel) and 12 because they need persisted session state or the standalone GET stream (client_params from initialize, cross-POST cancellation, unsolicited list-changed notifications). The connect fixture now pre-binds protocol_version into the returned factory; the value is still the single SPEC_VERSIONS entry and the factories ignore it, so this is wiring-only. 504 connect cells (411 existing + 93 stateless), all green; 100% coverage; pyright/ruff clean.
…own_failures The reachability invariants now check transport only. spec_version is already type-checked against the two-value SpecVersion Literal and runtime-validated against KNOWN_PROTOCOL_VERSIONS in __post_init__, so the SPEC_VERSIONS membership check was redundant and blocked annotating entries with spec_version="2026-07-28" before that version is on the active matrix axis.
…is gone in 2026-07-28 85 requirements whose source spec section is deleted in the 2026-07-28 revision now carry removed_in="2026-07-28" plus a one-line note citing the SEP and whether there is a replacement: - ping (SEP-2575, no replacement) - logging/setLevel, notifications/roots/list_changed (SEP-2575) - Mcp-Session-Id and protocol-level sessions: hosting:session:*, client-transport:http:session-*, flow:session:* (SEP-2567) - standalone GET stream + Last-Event-ID resumability: hosting:resume:*, hosting:http:standalone-sse*, client-transport:http:reconnect-* (SEP-2575) - resources/subscribe + unsubscribe (SEP-2575, replaced by subscriptions/listen) - tasks/* (SEP-2663, moved to extension) - -32042 URL-elicitation-required + elicitation/complete (SEP-2322 / spec PR #2891, replaced by MRTR input_required) Inert today (SPEC_VERSIONS has only 2025-11-25); when 2026-07-28 is added to the active axis these entries automatically drop their 2026 cells. No superseded_by links yet -- the 2026 replacement entries are added with the implementation work. test_coverage.py gains test_removed_entry_has_disposition: every removed_in entry must carry note or superseded_by. 504 connect cells unchanged; 100% coverage; pyright/ruff clean.
43 requirements whose spec section survives in 2026-07-28 but whose test bodies cannot run on the 2026 client path are annotated with arm_exclusions=(ArmExclusion(reason=..., spec_version="2026-07-28"),): - server-initiated-request (27): sampling/elicitation/roots tests that drive ctx.sample/elicit/list_roots, which issue JSON-RPC server-to- client requests. In 2026 those become MRTR input_required payloads; re-admit when MRTR adapters land. - requires-session (7): list-changed notifications, cross-POST cancellation, late-progress -- all need persisted per-session state the 2026 per-request model does not have. - asserts-legacy-handshake (5): test bodies that read InitializeResult fields or assert _meta shape that 2026's per-request envelope changes. - legacy-only-vocabulary (4): InitializeResult.capabilities flags and unsolicited logging that server/discover does not carry. 36 of the 43 extend an existing stateless arm_exclusions tuple; 7 are new single-member tuples. Inert today; when SPEC_VERSIONS gains 2026-07-28 these entries automatically drop their 2026 cells. 504 connect cells unchanged; 100% coverage; pyright/ruff clean.
89 entries with transports= but no note= now carry a one-line note explaining why the behaviour is transport-specific (the other 38 transport-restricted entries gained note= in the removed_in pass). Notes follow consistent per-prefix templates: OAuth is HTTP-only; HTTP status codes / headers / SSE stream lifecycle are HTTP-only; stderr/signals are stdio-only. test_coverage.py gains test_transport_restriction_has_note: every entry with transports= must carry note=. 504 connect cells unchanged; 100% coverage; pyright/ruff clean.
Per-entry validation of all 128 removed_in / 2026 arm_exclusions annotations against the 2025-11-25 and 2026-07-28 (draft) spec text found three inaccuracies: - resources:subscribe:capability-required: removed_in is correct but the note claimed a separate subscriptions capability exists; the draft schema retains resources.subscribe (reinterpreted as opt-in for the resourceSubscriptions filter on subscriptions/listen). - logging:message:all-levels, logging:message:fields: drop the legacy-only-vocabulary 2026 arm_exclusion. The eight RFC 5424 levels and the notifications/message payload fields are unchanged in the draft (deprecated by SEP-2577 but present), and neither test depends on logging/setLevel. 41 spec_version=2026 arm_exclusions remain (was 43); 504 connect cells unchanged; 100% coverage.
…ERSIONS growth
SPEC_BASE_URL was derived from SPEC_VERSIONS[-1], so appending
2026-07-28 to the active axis would silently repoint all 276
source=f"{SPEC_BASE_URL}/..." URLs to the 2026 spec -- including the
85 removed_in entries whose pages no longer exist there. SPEC_BASE_URL
is now a pinned literal for 2025-11-25; SPEC_2026_BASE_URL is added for
new entries; SPEC_REVISION is dropped.
Six compute_cells/cell_id unit tests in test_coverage.py inherited the
default SPEC_VERSIONS and would break when it grows; they now pass
spec_versions=("2025-11-25",) explicitly. A new invariant pins both
base-URL constants.
Verified by temporarily flipping SPEC_VERSIONS to dual: test_coverage.py
is 30/30 green and SPEC_BASE_URL stays pinned.
504 connect cells unchanged; 100% coverage; pyright/ruff clean.
…vering-set The earlier annotation passes minimised by tagging only one requirement ID per multi-decorated test, relying on compute_cells() intersection to drop the cell. That left the manifest's per-requirement semantics incomplete: the ArmExclusionReason enum is documented as a re-admission checklist (grep the reason to find what to re-admit), which only works if every semantically-excluded requirement carries the exclusion. 15 entries now carry their own arm_exclusions instead of inheriting via a sibling on the same test: - 12 server-initiated-request / requires-session entries gain both the streamable-http-stateless and spec_version=2026-07-28 exclusions (sampling:create:model-preferences/system-prompt, elicitation:form: basic/action:accept/schema:enum-variants, etc.) - 3 capability:declared entries (resources/prompts/completion) gain the legacy-only-vocabulary 2026 exclusion, matching tools:capability:declared lifecycle:initialize:capabilities:from-handlers and mcpserver:context: logging are deliberately not tagged (open classification question / era-agnostic respectively). Also: the README transport-matrix section now mentions all four connectable transports and the stateless carve-out, matching the era-axis section. 504 connect cells unchanged; 53 stateless / 56 2026 arm_exclusions; 100% coverage.
…in README The era-axis section claimed only arm_exclusions / added_in / removed_in filter the grid, but compute_cells() also drops cells for transports era-locked via TRANSPORT_SPEC_VERSIONS (currently sse to 2025-11-25). Add it to the filter list and to the new-spec-revision checklist.
There was a problem hiding this comment.
Both README gaps from my earlier review are now addressed (commit 140517f documents TRANSPORT_SPEC_VERSIONS as a grid filter and adds the checklist step), and I found no bugs in the matrix machinery itself — but this is a large, design-heavy test-infrastructure PR that's still marked work-in-progress, so it deserves a human look at the era-axis design before merge.
Extended reasoning...
Overview
This PR adds the (transport, spec_version) era-axis machinery to the interaction test suite: a fourth streamable-http-stateless connect arm, compute_cells() driving per-test parametrization from stacked @requirement marks, new manifest fields (added_in/removed_in, supersedes/superseded_by, arm_exclusions, known_failures, note), a TRANSPORT_SPEC_VERSIONS era-lock map, extensive new consistency tests in test_coverage.py, and a rewritten README section. It is confined entirely to tests/interaction/ with no production code changes, and is intentionally behaviour-preserving today (single-entry SPEC_VERSIONS, byte-identical node ids).
Security risks
None. The change touches only test infrastructure and documentation; no auth, crypto, network, or production code paths are involved.
Level of scrutiny
Moderate. Although test-only, this is a substantial design decision — how the suite will model spec-version eras, supersession, and per-cell exclusions going forward — and the PR description explicitly says it is work-in-progress with more changes (lifting stateless into the matrix, threading protocol_version into Client, the 2026 annotation pass) still to land on the branch. That design choice and the WIP status are what warrant a human maintainer's eyes, not correctness concerns.
Other factors
The new machinery is well covered by unit tests for compute_cells, cell_id, and the manifest validators, and test_coverage.py enforces the manifest↔test contract structurally. Both documentation gaps I raised in earlier review rounds (the stale transport-matrix prose and the undocumented TRANSPORT_SPEC_VERSIONS filter) have been fixed in commits daf7108 and 140517f, so no feedback of mine remains outstanding. The bug hunting system found no issues in this round.
Adds the (transport, spec_version) matrix machinery to the interaction test suite's requirements manifest, with no behaviour change yet (
SPEC_VERSIONShas a single entry; 411 connect-parametrized cells keep byte-identical node ids).Work in progress — further changes (lifting stateless into the matrix, threading
protocol_versionintoClient, the 2026 annotation pass) will land on this branch.