Skip to content

Feat/tasks mrtr extension#262

Open
panyam wants to merge 7 commits intomodelcontextprotocol:mainfrom
panyam:feat/tasks-mrtr-extension
Open

Feat/tasks mrtr extension#262
panyam wants to merge 7 commits intomodelcontextprotocol:mainfrom
panyam:feat/tasks-mrtr-extension

Conversation

@panyam
Copy link
Copy Markdown

@panyam panyam commented May 5, 2026

Summary

Adds server-conformance scenarios for SEP-2663 (Tasks Extension), with incidental coverage of SEP-2575 (per-request capability override) and SEP-2243 (Mcp-Method/Mcp-Name request headers) in the parts of the surface where they bind to tasks. Plus one new MRTR-adjacent check (mrtr-tasks-composition, currently SKIPPED) for the SEP-2663 commit 451f5e1 MRTR→Tasks promotion flow. 8 ClientScenario classes / ~33 internal checks for tasks plus 1 class / 7 SUCCESS + 1 SKIPPED for the MRTR↔Tasks composition placeholder. Tagged ['extension', DRAFT_PROTOCOL_VERSION] per #255 conventions and registered in pendingClientScenariosList so default everything-server runs stay green.

Motivation and Context

SEP-2663 (Tasks Extension), SEP-2575, and SEP-2243 are in active draft and currently have no conformance coverage in this repo. SDKs implementing them - including ones already shipping reference servers - have nothing to validate against, so wire-shape regressions and edge-case behavior (cancellation semantics, requestState handling, capability gating) slip
through SDK-internal tests. This PR fills that gap. The new scenarios assert what the spec text says, not what any specific implementation does, so any SDK can run them.

The MRTR↔Tasks composition placeholder (mrtr-tasks-composition, SKIPPED) is a forward-looking marker for SEP-2663 commit 451f5e1, which made the "MRTR rounds then promote to a task" flow normative on the wire - see the open spec questions below for why it's deferred.

How Has This Been Tested?

Run end-to-end against a reference Go fixture from the in-flight panyam/mcpkit SDK:

TASKS_SERVER_URL=http://localhost:18092/mcp \
TASKS_SERVER_CMD="/path/to/tasks-fixture --serve --addr :18092" \
MRTR_SERVER_URL=http://localhost:18093/mcp \
MRTR_SERVER_CMD="/path/to/mrtr-fixture --serve --addr :18093" \
  npx vitest run src/scenarios/server/

Branch results:

  • Tasks: 8/8 scenarios (~33 internal checks)
  • MRTR: 1/1 scenario (7 SUCCESS + 1 SKIPPED - see Open Question 2 in Additional Context)

The runner is brand-neutral and language-agnostic - fixture wired via env vars, spawn via sh -c, readiness via TCP polling, no log-line scanning. Anyone's server in any language works. Reference fixtures:

npm test against the upstream everything-server continues to pass - the new scenarios live in pendingClientScenariosList so all-scenarios.test.ts skips them until everything-server grows extension support.

Breaking Changes

None. All new scenarios are additive and tagged as 'extension' + DRAFT_PROTOCOL_VERSION, so they're invisible to dated --spec-version runs and only appear under --suite extensions or --spec-version draft. Default CI runs against everything-server are unaffected (the new scenarios are filtered out via pendingClientScenariosList).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Relationship to PR #188 (SEP-2322 MRTR)

Complementary, not overlapping. SEP-2663 builds on SEP-2322's base types, so a few of the tasks scenarios touch the MRTR shape (inputRequests, requestState, resultType) in their tasks-on-the-wire form (status:"input_required" on tasks/get, tasks/update resume path, partial inputResponses fulfillment). The standalone-ephemeral-MRTR coverage stays in #188.

The branch also contains a src/scenarios/server/mrtr/ folder with ephemeral-flow scenarios mirroring some of #188's checks. Those exist because the tasks reference fixture exercises the full MRTR base, and running them locally caught a real bug. For this upstream merge:

Scope (8 ClientScenario classes, ~33 checks)

  • tasks-lifecycle - sync vs task dispatch, DetailedTask shape, tool errors vs protocol errors, cancel ack, cancel-on-terminal -32602
  • tasks-capability-negotiation - extension advertised under capabilities.extensions; tasks/* gated behind negotiation; SEP-2575 per-request opt-in
  • tasks-wire-fields - ttlSeconds / pollIntervalMilliseconds renames, no early TTL expiry, no related-task _meta on inlined result
  • tasks-request-state - optional emission, echo acceptance, stale-but-valid tolerance (tasks-surface form)
  • tasks-mrtr-input - inputRequests on tasks/get, tasks/update resume, partial-fulfillment with multi-input fixture
  • tasks-request-headers - SEP-2243 server tolerates routing headers; body authoritative when conflicting
  • tasks-dispatch-and-envelope - removed v1 methods (-32601), legacy task param ignored, resultType:"complete" on every non-task response, strong-consistency immediate tasks/get, unknown taskId -32602
  • tasks-status-notifications - optional INFO check (notifications are MAY per spec)

Plus mrtr-ephemeral-flow (1 class / 7 SUCCESS + 1 SKIPPED) under src/scenarios/server/mrtr/.

Design highlights

  • Platform/Language agnostic runner. Fixture configured via env vars TASKS_SERVER_URL / TASKS_SERVER_CMD (and
    MRTR_SERVER_URL / MRTR_SERVER_CMD). Spawn via sh -c, readiness via TCP polling, no log-line scanning. Suite is
    describe.skip'd when env vars are unset.
  • Raw-fetch escape hatch. SDK's typed schemas strip the SEP-2663 wire fields (resultType, taskId, inputRequests,
    requestState, inlined result/error). Helpers in src/scenarios/server/tasks/helpers.ts provide initRawSession +
    rawRequest/rawRequestFull so scenarios read those fields directly. When the SDK gains schemas for SEP-2663 shapes, the
    call sites switch back to client.request(..., AnyResult) and the helper shrinks. Similar in spirit to the raw-MCP
    additions in Conformance Tests for SEP-2322 MRTR #188 - could converge on a shared helper.
  • Registered in pendingClientScenariosList - all-scenarios.test.ts skips them since everything-server doesn't
    implement the extension yet. CLI lookup (getClientScenario(name)) still finds them.

Open spec questions

  1. MRTR resultType discriminator value. SEP-2322's draft uses "input_required"; SEP-2663's draft uses "incomplete". Centralized as MRTR_INCOMPLETE_RESULT_TYPE for a one-line flip when SEP authors converge. Tracked at modelcontextprotocol/modelcontextprotocol PR 2663 comment 4381885336 and PR 2322 comment 4381884825.

  2. mrtr-tasks-composition. SEP-2663 commit 451f5e1 made the MRTR→Tasks promotion flow normative on the wire: a single tools/call MAY exchange one or more IncompleteResult rounds and then return CreateTaskResult on a subsequent round. Implementing this requires the server middleware to defer task creation until the handler signals async-promotion - the natural alternative (mint the task up-front the moment a tool advertises task support) doesn't fit, because by the time the handler's IsIncomplete signal is observable, the CreateTaskResult is already on the wire. This is a wire-contract requirement, not an SDK-specific implementation choice; existing SDKs across languages that took the up-front pattern will need refactoring before this check can pass anywhere. Combined with Adjust test and allow running in interactive mode #1 above, that's why the check is SKIPPED today.

Closes: #261

panyam added 4 commits May 5, 2026 14:14
Adds the first scenario for the SEP-2663 io.modelcontextprotocol/tasks
extension — a single TasksLifecycleScenario covering sync vs async
dispatch, DetailedTask shape on tasks/get, tool errors vs protocol
errors, and cancellation semantics. 8 ConformanceCheck records, all
passing against a SEP-2663-conformant Go fixture.

Why "tasks" (not "tasks-v2"): SEP-2663 IS the tasks surface once it
lands; the v2 suffix is only meaningful in implementations that
maintain a v1 surface alongside, which the conformance suite does not.

Layout:
- src/scenarios/server/tasks/lifecycle.ts — scenario class
- src/scenarios/server/tasks/helpers.ts — raw-fetch escape hatch
  (the SDK's typed schemas strip resultType/inputRequests/...)
- src/scenarios/server/tasks/lifecycle.test.ts — fork-local vitest
  runner. Two modes: spawn a fixture binary via MCPKIT_TASKS_BINARY,
  or point at an already-running server via MCPKIT_TASKS_SERVER_URL.
  Skips when neither is set so it doesn't break upstream CI runs that
  go through everything-server (which doesn't yet implement
  io.modelcontextprotocol/tasks).

Scenario is registered in pendingClientScenariosList so
all-scenarios.test.ts skips it; promote to active once the upstream
fixture grows extension support.

Tagged ['extension', DRAFT_PROTOCOL_VERSION] — selectable via
--suite extensions and --spec-version draft.
Builds out the rest of the tasks scenarios (atop the lifecycle canary)
and adds the SEP-2322 ephemeral MRTR scenario in a sibling folder.
Both target their own fixtures; both runners are brand-neutral and
language-agnostic (TASKS_SERVER_URL / TASKS_SERVER_CMD,
MRTR_SERVER_URL / MRTR_SERVER_CMD; readiness via TCP polling).

Tasks ClientScenario classes:
- TasksLifecycleScenario          (8 checks; v2-01..v2-08)
- TasksCapabilityNegotiationScenario (4 checks; v2-11/22/23/25, SEP-2575)
- TasksWireFieldsScenario         (3 checks; v2-12/13/21)
- TasksRequestStateScenario       (3 checks; v2-14/15/28)
- TasksMRTRInputScenario          (3 checks; v2-16/17/29 partial fulfillment)
- TasksRequestHeadersScenario     (3 checks; SEP-2243 request-header tolerance)
- TasksDispatchScenario           (8 checks; v2-09/10/19/20/26/27/30/31)
- TasksStatusNotificationsScenario (1 check; SEP-2663 §notifications, optional)

MRTR ClientScenario class:
- MrtrEphemeralFlowScenario       (7 checks + 1 SKIPPED; mrtr-01..07,
                                   mrtr-08 deferred for spec terminology +
                                   reference-impl reasons)

Both runners spawn the fixture via a shell command and detect readiness
by TCP-polling the URL's host/port — no log-line scanning, no
language-specific assumptions. The same env vars work for any server
implementation.

Scenarios are tagged ['extension', DRAFT_PROTOCOL_VERSION] and registered
in pendingClientScenariosList so all-scenarios.test.ts (which targets
the upstream everything-server) skips them until the fixture grows
SEP-2322 / SEP-2663 support.
Restructured around ClientScenario classes (one row per class with
check-list under it) rather than per-numbered-test slugs. Documents
fixture requirements, env vars, open spec questions, and the
wire-format diff for each suite.

Per AGENTS.md, severity follows spec keyword (MUST/MUST NOT → FAILURE,
SHOULD/SHOULD NOT → WARNING). The READMEs explain why some checks emit
INFO rather than FAILURE (optional emission paths per SEP-2322).
@panyam
Copy link
Copy Markdown
Author

panyam commented May 5, 2026

@LucaButBoring

panyam added a commit to panyam/mcpkit that referenced this pull request May 5, 2026
The bulk of the v2 tasks + MRTR conformance lives in the upstream-bound
fork now (panyam/mcpconformance, branch feat/tasks-mrtr-extension;
upstream Draft PR modelcontextprotocol/conformance#262). Updates the
README/WALKTHROUGH/walkthrough.go references in examples/tasks-v2 +
examples/mrtr to point at the fork, the migration guide
(docs/TASKS_V2_MIGRATION.md) likewise, and the matching Go test skip
(server/mrtr_test.go) to point at the new conformance scenario path.
No runtime changes.
panyam added a commit to panyam/mcpkit that referenced this pull request May 6, 2026
Compress CLAUDE.md's Conformance section to a one-liner roll-up + add
a Gotchas bullet for MCPCONFORMANCE_PATH (the env var the new
testconf-tasks-v2 / testconf-mrtr targets shell into). The detailed
fork-vs-local layout already lives in CAPABILITIES.md
mcp-tasks-v2-conformance.

Add a "Final disposition" footer to docs/SEP_2663_TASKS_CONFORMANCE_PLAN.md
recording the graduation upstream (panyam/mcpconformance fork branch
feat/tasks-mrtr-extension, Draft PR modelcontextprotocol/conformance#262)
and noting the mcpkit-local folders are now vitest sentinels reserved
for future mcpkit-stricter scenarios.

No memory pruning — the four feedback notes are still working guidance,
not duplicates of checked-in docs.
Two reviewer-driven additions:

1. SEP-2663 createdAt / lastUpdatedAt ISO-8601 assertion in
   `tasks-server-task-creation` (per Luca's PR modelcontextprotocol#262 review feedback).
   The check now flags servers that emit non-ISO timestamps (epoch
   seconds, RFC-2822, etc.) on TaskInfoV2 envelopes.

2. Factor cross-cutting test-harness helpers into _shared/:

   - `_shared/test-runner.ts` — `waitForServerReady` (renamed from
     `waitForTcpReady`; the call site cares about server readiness,
     not the TCP-poll mechanism). Imported by tasks/ and mrtr/
     all-scenarios.test.ts; replaces ~30 LOC of inline duplication
     in each.

   - `_shared/wire-format.ts` — `ISO_8601_PATTERN` constant +
     `isIso8601(s)` predicate. Documented rationale for choosing a
     regex over `Date.parse` (too permissive),
     `new Date(s).toISOString()` (too strict), or
     `Temporal.Instant.from` (Node 24+ experimental). Future
     wire-shape predicates (data URI, percent-encoded filename,
     etc.) can land here.

Cherry-pick footprint when graduating to upstream PR is the SEP
folder + the imported `_shared/` files. First PR through carries
them upstream; subsequent feat branches inherit via standard
upstream-sync flow.

All 9 scenario tests still pass against the Go reference fixtures.
panyam added a commit to panyam/mcpconformance that referenced this pull request May 6, 2026
Pure rename — the call site cares about server readiness, not the
TCP-poll implementation detail. Matches the rename now landed on
feat/tasks-mrtr-extension (PR modelcontextprotocol#262).
panyam added 2 commits May 6, 2026 14:22
… helpers

Drops initRawSession/rawRequest/rawRequestFull from tasks/helpers.ts in
favor of the SDK's Client + StreamableHTTPClientTransport, paired with
a Zod passthrough schema (AnyResult) that preserves SEP-2663 / SEP-2322
draft fields the SDK's typed schemas would strip.

headers.ts and notifications.ts keep a small inline fetch where the SDK
can't reach: per-request HTTP headers (SEP-2243) and SSE notification
observation. Both reuse the SDK session via transport.sessionId.

All SEP-2663 + MRTR ephemeral-flow scenarios pass against the Go fixture.
@panyam panyam marked this pull request as ready for review May 6, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SEP-2663 Tasks Extension conformance scenarios

1 participant