Skip to content

Proposal: Tasks conformance scenarios (spec 2025-11-25) #244

@panyam

Description

@panyam

Summary

Proposing a set of conformance scenarios for the MCP Tasks protocol (spec 2025-11-25). Tasks enables async tool execution with lifecycle tracking - tools/call returns a task, clients poll via tasks/get, fetch results via tasks/result, and cancel via tasks/cancel.

Context

I've been coordinating with @LucaButBoring on the Tasks spec and SEP-2557 direction. Would appreciate his review on the scenario design and check coverage to make sure we're testing the right semantics before I restructure into a PR.

What would be covered

I have 27 checks written and passing against the TS SDK (and an experimental Go implementation). They'd consolidate into ~5 scenarios following our "one scenario with many checks" principle:

Scenario: task-lifecycle (~10 checks)

  • Sync tool call returns immediately (no task)
  • Server returns CreateTaskResult with taskId, non-terminal status, createdAt
  • tasks/get returns flat task info
  • Task completes → status: completed
  • tasks/result returns ToolResult with io.modelcontextprotocol/related-task in _meta
  • tasks/get SHALL NOT include related-task _meta (taskId param is source of truth)
  • Failing tool transitions to status: failed, tasks/result has isError: true
  • tasks/cancel → status: cancelled, confirmed via tasks/get
  • tasks/list returns array (capability-conditional)
  • Concurrent creation produces unique taskIds

Scenario: task-errors (~5 checks)

  • Required tool without task hint → -32601
  • Forbidden tool with task hint → -32601
  • tasks/get with bogus taskId → -32602
  • tasks/cancel with bogus taskId → -32602
  • Cancel already-terminal task → -32602

Scenario: task-configuration (~4 checks)

  • TTL present and positive in CreateTaskResult (client hint is advisory)
  • pollInterval valid if present (optional, server-provided)
  • Task must not expire before TTL elapses
  • tools/list includes execution.taskSupport per tool

Scenario: task-side-channel (~4 checks)

  • Elicitation round-trip: tool → input_required → elicitation via tasks/result → completed
  • Sampling round-trip: tool → input_required → sampling via tasks/result → completed
  • Optional tool without hint runs synchronously (inline result, no task)
  • External proxy tool completes via custom getTask/getResult handlers (Still TBD on this as it could be testing an internal detail so will revisit this).

Scenario: task-notifications (~4 checks)

  • Progress notifications well-formed if sent (optional)
  • Status notifications match actual task state if sent (optional)
  • Completed status notification references the correct taskId

What's intentionally NOT tested

  • Authorization-context binding - identity isn't formally defined; no portable test without a real identity model
  • TTL post-expiry - servers MAY expire lazily after TTL; only pre-TTL existence is testable
  • Specific notification timing - notifications are optional per current spec

Prior work

  • Passing against TS SDK reference server and Experimental Go Ref Server (mcpkit)
  • Assertions follow spec MUST/SHOULD/MAY carefully - learned from feedback on earlier drafts
  • Error codes documented per spec (-32601 for hint mismatches, -32602 for invalid task ops); TS SDK currently returns -32603 for these
    (incorrect)

Next steps

Happy to restructure into the scenario/check architecture and open a PR. The test tools needed are straightforward:

  • greet - sync-only (no execution field)
  • slow_compute - optional task support, sleeps N seconds
  • failing_job - required task support, always fails after 1s
  • confirm_delete - required, elicitation via side-channel
  • write_haiku - required, sampling via side-channel

These could be added to the existing everything-server or as a tasks-specific test server (or examples/clients/typescript/everything-client.ts??).

Also:

  • Each check will include specReferences pointing to the relevant spec sections (tools/call task hints, tasks/get, tasks/result, tasks/cancel).
  • Will implement using the Scenario interface (start/stop/getChecks) and register in src/scenarios/index.ts

Looking ahead

A v2 revision of the Tasks protocol is in progress (draft stage, targeted for the June spec release). It simplifies the model significantly - inlining results into tasks/get, removing tasks/result and tasks/list, and moving to server-directed task creation. I have a v2 conformance suite drafted as well and plan to propose it once the spec stabilizes. Getting v1 coverage in place now will also make it easier to validate the v2 migration path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions