Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 68 additions & 13 deletions tests/interaction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,12 @@ test body — each directory pins its flavour's true output exactly.

Transport-agnostic tests take the `connect` fixture instead of constructing `Client(server)`
directly, and therefore run once per transport: over the in-memory transport, over the server's
real streamable HTTP app driven in-process through the streaming bridge, and over the legacy SSE
transport the same way. A test connects with `async with connect(server, ...) as client:` and
asserts the same output on every leg, because the transport is not supposed to change observable
behaviour. Tests that are tied to one transport do not use the fixture: the wire-recording tests
real streamable HTTP app driven in-process through the streaming bridge (in both stateful and
stateless configurations), and over the legacy SSE transport the same way. A test connects with
`async with connect(server, ...) as client:` and asserts the same output on every leg, because the
transport is not supposed to change observable behaviour. Requirements that need a server-to-client
back-channel or persisted session state are carved out of the stateless arm via `arm_exclusions`.
Tests that are tied to one transport do not use the fixture: the wire-recording tests
(their seam is the in-memory stream pair), the bare-`ClientSession` lifecycle tests, the
real-clock timeout tests (the timeout machinery is transport-independent and must not race
transport latency), and everything under `transports/`, which pins behaviour only observable on
Expand Down Expand Up @@ -95,6 +97,14 @@ clients can share one session manager.
would require real-time waits the suite refuses.
- **`transports`** names the transports a behaviour applies to; omitted means transport-independent.
- **`issue`** carries the tracking link for a recorded gap once one is filed.
- **`note`** carries free-form context that does not fit `divergence` or `deferred`.
- **`added_in`** / **`removed_in`** bound the spec versions the behaviour exists in, as a half-open
`[added_in, removed_in)` window.
- **`supersedes`** / **`superseded_by`** link a retired entry to its replacement; the link is
bidirectional and both ends must be versioned.
- **`arm_exclusions`** carve specific `(transport, spec_version)` matrix cells out with a typed
`ArmExclusionReason`.
- **`known_failures`** mark specific `(transport, spec_version)` cells as strict xfail.

Tests link themselves to the manifest with a decorator:

Expand Down Expand Up @@ -126,15 +136,60 @@ This is also the triage key for any rewrite: a test that fails on the new code p
divergence note (the rewrite accidentally fixed a known gap — decide whether to keep the fix) or
it does not (the rewrite broke something that was correct — fix the rewrite).

### When a new spec revision is released

1. Update `SPEC_REVISION` and walk the new revision's changelog.
2. For each changed interaction, find its requirements (the IDs use the wire method strings the
changelog speaks in), re-audit the tests against the new text, and update `source` links and
assertions where behaviour legitimately changed.
3. New interactions get new requirements and new tests; removed interactions get their
requirements deleted along with their tests.
4. A behaviour that is correct under both revisions needs no change beyond the `source` link.
### Spec versions and the era axis

`SPEC_VERSIONS` in `_requirements.py` is the ordered tuple of protocol revisions the suite
exercises. `SPEC_BASE_URL` (and `SPEC_2026_BASE_URL`) are pinned literals — not derived from
`SPEC_VERSIONS` — so growing the active axis never repoints existing `source` links. The
`connect` fixture fans out over `CONNECTABLE_TRANSPORTS × SPEC_VERSIONS`, but the grid is
filtered per test:
`pytest_generate_tests` reads the test's stacked `@requirement` marks and calls `compute_cells()`,
which intersects the admissible cells across every cited requirement — a cell survives only if
**all** of the test's requirements admit it.

`streamable-http-stateless` is the fourth connectable transport: the 2025-era unofficial stateless
mode where each request opens a fresh transport, no session id is issued, and there is no standalone
GET stream. Requirements that need a server→client back-channel or persisted session state are
excluded from that arm via `arm_exclusions` (reasons `server-initiated-request` and
`requires-session`).

What admits or excludes a cell:

- **`added_in` / `removed_in`** gate which spec versions a requirement exists in, as a half-open
`[added_in, removed_in)` window. A test runs only on versions inside every cited requirement's
window.
- **`arm_exclusions`** carve specific `(transport, spec_version)` cells out with a typed
`ArmExclusionReason`. The reason vocabulary doubles as a re-admission checklist: when the gap
closes, grep for the reason string to find every cell to re-admit.
- **`known_failures`** keep a cell in the grid but mark it as a strict xfail — the test runs and
must fail; an unexpected pass fails the suite.
- **`TRANSPORT_SPEC_VERSIONS`** era-locks a transport to a subset of spec versions (currently only
`sse` is locked to `2025-11-25`). A `(transport, version)` cell is dropped if the version is not
in the transport's entry; transports absent from the map serve every spec version. This is the
mechanism for cutting an entire transport off from a new revision (or admitting it).
- **`transports`** is descriptive metadata for the non-`connect` transport-specific suites under
Comment thread
claude[bot] marked this conversation as resolved.
`transports/` and does **not** drive cell generation. Only `arm_exclusions`, `added_in`,
`removed_in`, and `TRANSPORT_SPEC_VERSIONS` filter the grid.
- **`supersedes` / `superseded_by`** link a retired entry to its replacement. `test_coverage.py`
enforces that links are bidirectional and versioned: the retired entry carries `removed_in`, the
replacement carries `added_in`.

Node IDs stay `[transport]` while `len(SPEC_VERSIONS) == 1`, so today's test IDs are
byte-identical to before the era axis existed. They become `[transport-version]` the moment a
second version is appended to `SPEC_VERSIONS`.

When a new spec revision lands:

1. Append the version string to `SPEC_VERSIONS` (and to the `SpecVersion` `Literal`).
2. Walk the new revision's changelog.
3. For each affected requirement: set `removed_in` on retired behaviour, add a new entry with
`added_in` for its replacement, and link the pair with `supersedes` / `superseded_by`.
Behaviour that survives unchanged needs nothing beyond a re-audit of its `source` URL.
4. For requirements that cannot run on the new era's path, add an `arm_exclusions` entry with the
appropriate `ArmExclusionReason`.
5. Review `TRANSPORT_SPEC_VERSIONS`: any era-locked transport will not produce cells on the new
version unless its entry is extended (or removed); add an entry for any transport the new
revision retires.

## Writing a test

Expand Down
20 changes: 16 additions & 4 deletions tests/interaction/_connect.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from collections.abc import AsyncIterator, Awaitable, Callable, Iterable
from contextlib import AbstractAsyncContextManager, asynccontextmanager
from functools import partial
from typing import Any, Protocol

import httpx
Expand Down Expand Up @@ -69,6 +70,7 @@ def __call__(
message_handler: MessageHandlerFnT | None = None,
client_info: Implementation | None = None,
elicitation_callback: ElicitationFnT | None = None,
protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> AbstractAsyncContextManager[Client]: ...


Expand All @@ -83,6 +85,7 @@ async def connect_in_memory(
message_handler: MessageHandlerFnT | None = None,
client_info: Implementation | None = None,
elicitation_callback: ElicitationFnT | None = None,
protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> AsyncIterator[Client]:
"""Yield a Client connected to the server over the in-memory transport."""
async with Client(
Expand Down Expand Up @@ -113,13 +116,15 @@ async def connect_over_streamable_http(
message_handler: MessageHandlerFnT | None = None,
client_info: Implementation | None = None,
elicitation_callback: ElicitationFnT | None = None,
protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> AsyncIterator[Client]:
"""Yield a Client connected to the server's streamable HTTP app, entirely in process.

With the defaults this is the matrix leg (stateful sessions, SSE responses); the
transport-specific tests pass `stateless_http` or `json_response` to select the other
server modes, and the resumability tests pass an `event_store` (with `retry_interval=0` so
the client's reconnection wait is a no-op).
With the defaults this is the matrix leg (stateful sessions, SSE responses); the stateless
matrix arm binds `stateless_http=True` (see `connect_over_streamable_http_stateless`);
transport-specific tests pass `json_response` to select the other server mode, and the
resumability tests pass an `event_store` (with `retry_interval=0` so the client's
reconnection wait is a no-op).
"""
app = server.streamable_http_app(
stateless_http=stateless_http,
Expand All @@ -145,6 +150,12 @@ async def connect_over_streamable_http(
yield client


connect_over_streamable_http_stateless: Connect = partial(connect_over_streamable_http, stateless_http=True)
"""The streamable-http matrix arm with the server in stateless mode (fresh transport per request,
no session id, no standalone GET stream). The same shared Server instance backs every request --
stateless mode does not require a server factory."""


@asynccontextmanager
async def mounted_app(
server: Server | MCPServer,
Expand Down Expand Up @@ -326,6 +337,7 @@ async def connect_over_sse(
message_handler: MessageHandlerFnT | None = None,
client_info: Implementation | None = None,
elicitation_callback: ElicitationFnT | None = None,
protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> AsyncIterator[Client]:
"""Yield a Client connected to the server's legacy SSE transport, entirely in process."""
app, _ = build_sse_app(server)
Expand Down
Loading
Loading