Summary
Proposing a set of conformance scenarios for the MCP Tasks protocol (spec 2025-11-25). Tasks enables async tool execution with lifecycle tracking - tools/call returns a task, clients poll via tasks/get, fetch results via tasks/result, and cancel via tasks/cancel.
Context
I've been coordinating with @LucaButBoring on the Tasks spec and SEP-2557 direction. Would appreciate his review on the scenario design and check coverage to make sure we're testing the right semantics before I restructure into a PR.
What would be covered
I have 27 checks written and passing against the TS SDK (and an experimental Go implementation). They'd consolidate into ~5 scenarios following our "one scenario with many checks" principle:
Scenario: task-lifecycle (~10 checks)
- Sync tool call returns immediately (no task)
- Server returns CreateTaskResult with taskId, non-terminal status, createdAt
- tasks/get returns flat task info
- Task completes → status: completed
- tasks/result returns ToolResult with io.modelcontextprotocol/related-task in _meta
- tasks/get SHALL NOT include related-task _meta (taskId param is source of truth)
- Failing tool transitions to status: failed, tasks/result has isError: true
- tasks/cancel → status: cancelled, confirmed via tasks/get
- tasks/list returns array (capability-conditional)
- Concurrent creation produces unique taskIds
Scenario: task-errors (~5 checks)
- Required tool without task hint → -32601
- Forbidden tool with task hint → -32601
- tasks/get with bogus taskId → -32602
- tasks/cancel with bogus taskId → -32602
- Cancel already-terminal task → -32602
Scenario: task-configuration (~4 checks)
- TTL present and positive in CreateTaskResult (client hint is advisory)
- pollInterval valid if present (optional, server-provided)
- Task must not expire before TTL elapses
- tools/list includes execution.taskSupport per tool
Scenario: task-side-channel (~4 checks)
- Elicitation round-trip: tool → input_required → elicitation via tasks/result → completed
- Sampling round-trip: tool → input_required → sampling via tasks/result → completed
- Optional tool without hint runs synchronously (inline result, no task)
- External proxy tool completes via custom getTask/getResult handlers (Still TBD on this as it could be testing an internal detail so will revisit this).
Scenario: task-notifications (~4 checks)
- Progress notifications well-formed if sent (optional)
- Status notifications match actual task state if sent (optional)
- Completed status notification references the correct taskId
What's intentionally NOT tested
- Authorization-context binding - identity isn't formally defined; no portable test without a real identity model
- TTL post-expiry - servers MAY expire lazily after TTL; only pre-TTL existence is testable
- Specific notification timing - notifications are optional per current spec
Prior work
- Passing against TS SDK reference server and Experimental Go Ref Server (mcpkit)
- Assertions follow spec MUST/SHOULD/MAY carefully - learned from feedback on earlier drafts
- Error codes documented per spec (-32601 for hint mismatches, -32602 for invalid task ops); TS SDK currently returns -32603 for these
(incorrect)
Next steps
Happy to restructure into the scenario/check architecture and open a PR. The test tools needed are straightforward:
- greet - sync-only (no execution field)
- slow_compute - optional task support, sleeps N seconds
- failing_job - required task support, always fails after 1s
- confirm_delete - required, elicitation via side-channel
- write_haiku - required, sampling via side-channel
These could be added to the existing everything-server or as a tasks-specific test server (or examples/clients/typescript/everything-client.ts??).
Also:
- Each check will include specReferences pointing to the relevant spec sections (tools/call task hints, tasks/get, tasks/result, tasks/cancel).
- Will implement using the Scenario interface (start/stop/getChecks) and register in src/scenarios/index.ts
Looking ahead
A v2 revision of the Tasks protocol is in progress (draft stage, targeted for the June spec release). It simplifies the model significantly - inlining results into tasks/get, removing tasks/result and tasks/list, and moving to server-directed task creation. I have a v2 conformance suite drafted as well and plan to propose it once the spec stabilizes. Getting v1 coverage in place now will also make it easier to validate the v2 migration path.
Summary
Proposing a set of conformance scenarios for the MCP Tasks protocol (spec 2025-11-25). Tasks enables async tool execution with lifecycle tracking - tools/call returns a task, clients poll via tasks/get, fetch results via tasks/result, and cancel via tasks/cancel.
Context
I've been coordinating with @LucaButBoring on the Tasks spec and SEP-2557 direction. Would appreciate his review on the scenario design and check coverage to make sure we're testing the right semantics before I restructure into a PR.
What would be covered
I have 27 checks written and passing against the TS SDK (and an experimental Go implementation). They'd consolidate into ~5 scenarios following our "one scenario with many checks" principle:
Scenario: task-lifecycle (~10 checks)
Scenario: task-errors (~5 checks)
Scenario: task-configuration (~4 checks)
Scenario: task-side-channel (~4 checks)
Scenario: task-notifications (~4 checks)
What's intentionally NOT tested
Prior work
(incorrect)
Next steps
Happy to restructure into the scenario/check architecture and open a PR. The test tools needed are straightforward:
These could be added to the existing everything-server or as a tasks-specific test server (or examples/clients/typescript/everything-client.ts??).
Also:
Looking ahead
A v2 revision of the Tasks protocol is in progress (draft stage, targeted for the June spec release). It simplifies the model significantly - inlining results into tasks/get, removing tasks/result and tasks/list, and moving to server-directed task creation. I have a v2 conformance suite drafted as well and plan to propose it once the spec stabilizes. Getting v1 coverage in place now will also make it easier to validate the v2 migration path.