Skip to content

tests/interaction: era-axis machinery for the requirements manifest#2909

Merged
maxisbey merged 13 commits into
mainfrom
requirements-era-axis
Jun 19, 2026
Merged

tests/interaction: era-axis machinery for the requirements manifest#2909
maxisbey merged 13 commits into
mainfrom
requirements-era-axis

Conversation

@maxisbey

@maxisbey maxisbey commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Adds the (transport, spec_version) matrix machinery to the interaction test suite's requirements manifest, with no behaviour change yet (SPEC_VERSIONS has a single entry; 411 connect-parametrized cells keep byte-identical node ids).

Work in progress — further changes (lifting stateless into the matrix, threading protocol_version into Client, the 2026 annotation pass) will land on this branch.

maxisbey added 4 commits June 18, 2026 16:29
Adds the type machinery for parametrizing the interaction suite over
(transport, spec_version) cells, with no behaviour change yet:

- SpecVersion / SPEC_VERSIONS / CONNECTABLE_TRANSPORTS / TRANSPORT_SPEC_VERSIONS
  constants; SPEC_REVISION now derived from SPEC_VERSIONS[-1]
- ArmExclusionReason literal + ArmExclusion / KnownFailure dataclasses
- Requirement gains note / added_in / removed_in / supersedes /
  superseded_by / arm_exclusions / known_failures (all defaulted, no
  manifest entry edited)
- Connect protocol + the three connect_* factories accept a
  protocol_version kwarg (currently ignored)

534 tests collected (unchanged); pyright/ruff clean.
…_ self-tests

compute_cells() expands a test's stacked @requirement marks into the
(transport, spec_version) pytest.param list with intersection semantics:
a cell is dropped if any requirement's [added_in, removed_in) window or
arm_exclusions excludes it, and marked xfail-strict if any requirement's
known_failures matches it. Requirement.transports is intentionally not
consulted (descriptive metadata only). cell_id() keeps node-id suffixes
as bare transport names while SPEC_VERSIONS has a single entry.

test_coverage.py gains pytest.raises self-tests for every new
__post_init__ validation branch on ArmExclusion / KnownFailure /
Requirement.

12 passed; pyright/ruff clean; conftest not yet wired.
pytest_generate_tests now reads each connect-taking test's stacked
@requirement marks, looks them up in REQUIREMENTS, and parametrizes the
connect fixture with compute_cells() output. The fixture body unpacks
the (transport, spec_version) tuple and returns the bare factory.

411 connect-parametrized cells with byte-identical [transport] node ids;
no behaviour change while SPEC_VERSIONS has a single entry.

test_coverage.py gains 10 unit tests for compute_cells/cell_id covering
intersection semantics, wildcard arm-exclusion / known-failure matching,
TRANSPORT_SPEC_VERSIONS gating, and the transports-field-is-metadata-only
guarantee. _requirements.py is at 100% line+branch coverage from
test_coverage.py alone.
test_coverage.py gains five manifest-level invariants (vacuously green
until entries start using the new fields):
- SPEC_VERSIONS subset of KNOWN_PROTOCOL_VERSIONS and includes LATEST
- CONNECTABLE_TRANSPORTS == conftest._FACTORIES keys
- supersedes / superseded_by links are bidirectional and versioned
- every arm_exclusion / known_failure targets a reachable cell

README.md: the requirements-manifest field list documents the new
fields, and the spec-revision section is rewritten as 'Spec versions
and the era axis' covering compute_cells() intersection semantics, the
transports-is-metadata-only rule, and the checklist for landing a new
revision.

Full gate: 2299 passed, 100.00% coverage, pyright/ruff clean,
411 connect cells with unchanged ids.
@maxisbey maxisbey force-pushed the requirements-era-axis branch from 0c09063 to 0a5be73 Compare June 18, 2026 16:52
maxisbey added 5 commits June 19, 2026 09:26
The 2025-era unofficial stateless mode (fresh transport per request, no
session id, no standalone GET stream) is now a fourth connectable
transport alongside in-memory, sse, and streamable-http. The factory is
a partial of connect_over_streamable_http with stateless_http=True; the
same shared Server instance backs every request, so no server-factory
widening is needed.

41 requirements that structurally cannot run on the stateless arm are
annotated with arm_exclusions scoped to this transport: 29 because the
server cannot issue requests back to the client (sampling, elicitation,
roots, server-to-client ping/cancel) and 12 because they need persisted
session state or the standalone GET stream (client_params from
initialize, cross-POST cancellation, unsolicited list-changed
notifications).

The connect fixture now pre-binds protocol_version into the returned
factory; the value is still the single SPEC_VERSIONS entry and the
factories ignore it, so this is wiring-only.

504 connect cells (411 existing + 93 stateless), all green;
100% coverage; pyright/ruff clean.
…own_failures

The reachability invariants now check transport only. spec_version is
already type-checked against the two-value SpecVersion Literal and
runtime-validated against KNOWN_PROTOCOL_VERSIONS in __post_init__, so
the SPEC_VERSIONS membership check was redundant and blocked annotating
entries with spec_version="2026-07-28" before that version is on the
active matrix axis.
…is gone in 2026-07-28

85 requirements whose source spec section is deleted in the 2026-07-28
revision now carry removed_in="2026-07-28" plus a one-line note citing
the SEP and whether there is a replacement:

- ping (SEP-2575, no replacement)
- logging/setLevel, notifications/roots/list_changed (SEP-2575)
- Mcp-Session-Id and protocol-level sessions: hosting:session:*,
  client-transport:http:session-*, flow:session:* (SEP-2567)
- standalone GET stream + Last-Event-ID resumability: hosting:resume:*,
  hosting:http:standalone-sse*, client-transport:http:reconnect-* (SEP-2575)
- resources/subscribe + unsubscribe (SEP-2575, replaced by
  subscriptions/listen)
- tasks/* (SEP-2663, moved to extension)
- -32042 URL-elicitation-required + elicitation/complete
  (SEP-2322 / spec PR #2891, replaced by MRTR input_required)

Inert today (SPEC_VERSIONS has only 2025-11-25); when 2026-07-28 is
added to the active axis these entries automatically drop their 2026
cells. No superseded_by links yet -- the 2026 replacement entries are
added with the implementation work.

test_coverage.py gains test_removed_entry_has_disposition: every
removed_in entry must carry note or superseded_by.

504 connect cells unchanged; 100% coverage; pyright/ruff clean.
43 requirements whose spec section survives in 2026-07-28 but whose
test bodies cannot run on the 2026 client path are annotated with
arm_exclusions=(ArmExclusion(reason=..., spec_version="2026-07-28"),):

- server-initiated-request (27): sampling/elicitation/roots tests that
  drive ctx.sample/elicit/list_roots, which issue JSON-RPC server-to-
  client requests. In 2026 those become MRTR input_required payloads;
  re-admit when MRTR adapters land.
- requires-session (7): list-changed notifications, cross-POST
  cancellation, late-progress -- all need persisted per-session state
  the 2026 per-request model does not have.
- asserts-legacy-handshake (5): test bodies that read InitializeResult
  fields or assert _meta shape that 2026's per-request envelope changes.
- legacy-only-vocabulary (4): InitializeResult.capabilities flags and
  unsolicited logging that server/discover does not carry.

36 of the 43 extend an existing stateless arm_exclusions tuple; 7 are
new single-member tuples. Inert today; when SPEC_VERSIONS gains
2026-07-28 these entries automatically drop their 2026 cells.

504 connect cells unchanged; 100% coverage; pyright/ruff clean.
89 entries with transports= but no note= now carry a one-line note
explaining why the behaviour is transport-specific (the other 38
transport-restricted entries gained note= in the removed_in pass).
Notes follow consistent per-prefix templates: OAuth is HTTP-only;
HTTP status codes / headers / SSE stream lifecycle are HTTP-only;
stderr/signals are stdio-only.

test_coverage.py gains test_transport_restriction_has_note: every
entry with transports= must carry note=.

504 connect cells unchanged; 100% coverage; pyright/ruff clean.
@maxisbey maxisbey marked this pull request as ready for review June 19, 2026 10:19
Per-entry validation of all 128 removed_in / 2026 arm_exclusions
annotations against the 2025-11-25 and 2026-07-28 (draft) spec text
found three inaccuracies:

- resources:subscribe:capability-required: removed_in is correct but
  the note claimed a separate subscriptions capability exists; the
  draft schema retains resources.subscribe (reinterpreted as opt-in
  for the resourceSubscriptions filter on subscriptions/listen).
- logging:message:all-levels, logging:message:fields: drop the
  legacy-only-vocabulary 2026 arm_exclusion. The eight RFC 5424 levels
  and the notifications/message payload fields are unchanged in the
  draft (deprecated by SEP-2577 but present), and neither test depends
  on logging/setLevel.

41 spec_version=2026 arm_exclusions remain (was 43); 504 connect cells
unchanged; 100% coverage.
Comment thread tests/interaction/README.md
maxisbey added 3 commits June 19, 2026 11:02
…ERSIONS growth

SPEC_BASE_URL was derived from SPEC_VERSIONS[-1], so appending
2026-07-28 to the active axis would silently repoint all 276
source=f"{SPEC_BASE_URL}/..." URLs to the 2026 spec -- including the
85 removed_in entries whose pages no longer exist there. SPEC_BASE_URL
is now a pinned literal for 2025-11-25; SPEC_2026_BASE_URL is added for
new entries; SPEC_REVISION is dropped.

Six compute_cells/cell_id unit tests in test_coverage.py inherited the
default SPEC_VERSIONS and would break when it grows; they now pass
spec_versions=("2025-11-25",) explicitly. A new invariant pins both
base-URL constants.

Verified by temporarily flipping SPEC_VERSIONS to dual: test_coverage.py
is 30/30 green and SPEC_BASE_URL stays pinned.

504 connect cells unchanged; 100% coverage; pyright/ruff clean.
…vering-set

The earlier annotation passes minimised by tagging only one requirement
ID per multi-decorated test, relying on compute_cells() intersection to
drop the cell. That left the manifest's per-requirement semantics
incomplete: the ArmExclusionReason enum is documented as a re-admission
checklist (grep the reason to find what to re-admit), which only works
if every semantically-excluded requirement carries the exclusion.

15 entries now carry their own arm_exclusions instead of inheriting via
a sibling on the same test:

- 12 server-initiated-request / requires-session entries gain both the
  streamable-http-stateless and spec_version=2026-07-28 exclusions
  (sampling:create:model-preferences/system-prompt, elicitation:form:
  basic/action:accept/schema:enum-variants, etc.)
- 3 capability:declared entries (resources/prompts/completion) gain the
  legacy-only-vocabulary 2026 exclusion, matching tools:capability:declared

lifecycle:initialize:capabilities:from-handlers and mcpserver:context:
logging are deliberately not tagged (open classification question /
era-agnostic respectively).

Also: the README transport-matrix section now mentions all four
connectable transports and the stateless carve-out, matching the
era-axis section.

504 connect cells unchanged; 53 stateless / 56 2026 arm_exclusions;
100% coverage.
…in README

The era-axis section claimed only arm_exclusions / added_in / removed_in
filter the grid, but compute_cells() also drops cells for transports
era-locked via TRANSPORT_SPEC_VERSIONS (currently sse to 2025-11-25).
Add it to the filter list and to the new-spec-revision checklist.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both README gaps from my earlier review are now addressed (commit 140517f documents TRANSPORT_SPEC_VERSIONS as a grid filter and adds the checklist step), and I found no bugs in the matrix machinery itself — but this is a large, design-heavy test-infrastructure PR that's still marked work-in-progress, so it deserves a human look at the era-axis design before merge.

Extended reasoning...

Overview

This PR adds the (transport, spec_version) era-axis machinery to the interaction test suite: a fourth streamable-http-stateless connect arm, compute_cells() driving per-test parametrization from stacked @requirement marks, new manifest fields (added_in/removed_in, supersedes/superseded_by, arm_exclusions, known_failures, note), a TRANSPORT_SPEC_VERSIONS era-lock map, extensive new consistency tests in test_coverage.py, and a rewritten README section. It is confined entirely to tests/interaction/ with no production code changes, and is intentionally behaviour-preserving today (single-entry SPEC_VERSIONS, byte-identical node ids).

Security risks

None. The change touches only test infrastructure and documentation; no auth, crypto, network, or production code paths are involved.

Level of scrutiny

Moderate. Although test-only, this is a substantial design decision — how the suite will model spec-version eras, supersession, and per-cell exclusions going forward — and the PR description explicitly says it is work-in-progress with more changes (lifting stateless into the matrix, threading protocol_version into Client, the 2026 annotation pass) still to land on the branch. That design choice and the WIP status are what warrant a human maintainer's eyes, not correctness concerns.

Other factors

The new machinery is well covered by unit tests for compute_cells, cell_id, and the manifest validators, and test_coverage.py enforces the manifest↔test contract structurally. Both documentation gaps I raised in earlier review rounds (the stale transport-matrix prose and the undocumented TRANSPORT_SPEC_VERSIONS filter) have been fixed in commits daf7108 and 140517f, so no feedback of mine remains outstanding. The bug hunting system found no issues in this round.

@maxisbey maxisbey merged commit 364b762 into main Jun 19, 2026
35 checks passed
@maxisbey maxisbey deleted the requirements-era-axis branch June 19, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants