StreamableHttp client: transparent session re-init (HTTP 404) orphans in-flight requests, hanging `call_tool` forever

### Summary

With the StreamableHttp **client** transport and `reinit_on_expired_session = true` (the default), a request that is **in flight when the session expires (HTTP 404)** can be permanently orphaned: its response is dropped, the pending responder is never completed *or errored*, and the caller's request future (e.g. `peer.call_tool(...)`) hangs **forever**. Because the typed peer methods use no timeout, there is no recovery.

This is a follow-up to #733, which added transparent re-initialization — the re-init path itself has a request-orphaning race.

- crate: `rmcp` **1.7.0**, feature `client` + `transport-streamable-http-client`
- Observed downstream as modelcontextprotocol-unrelated app hangs; full downstream report: [0xPlaygrounds/rig#1914](https://github.com/0xPlaygrounds/rig/issues/1914).

### Symptom (from a user log)

```
INFO  rmcp::transport::streamable_http_client: session expired (HTTP 404), attempting transparent re-initialization
WARN  rmcp::transport::streamable_http_client: sse client event stream terminated with error: Err(TokioJoinError(JoinError::Cancelled(Id(15065))))
```
…after which the client is stuck. The `JoinError::Cancelled` WARN is **expected** (it's `streams.abort_all()` firing during re-init) — a symptom, not the cause.

### Root-cause analysis (rmcp 1.7.0)

A `tools/call` response is **not** the return value of the POST — the POST resolves on `202 Accepted`, and the actual `CallToolResult` arrives asynchronously as a `ServerMessage` on an SSE stream task, matched back to the caller by JSON-RPC id via `local_responder_pool`.

1. On `SessionExpired` with re-init enabled, the worker (`src/transport/streamable_http_client.rs`) logs re-init (`:670`), runs `perform_reinitialization` (`:672`), then calls **`streams.abort_all()` (`:684`)**, which aborts **every** SSE stream task — including the standalone GET stream carrying outstanding responses, and any in-flight POST-SSE response stream.
2. The aborted stream is dropped mid-poll. `SseAutoReconnectStream` (`src/transport/.../client_side_sse.rs`) only reconnects on `Some(Err(..))` or graceful `None`; a `JoinSet` abort is **neither**, so reconnect/`last_event_id` recovery never runs. The in-flight response is lost.
3. Re-init replays **only the single message that received the 404** (`:759`); any *other* concurrently-pending request — or one whose response was mid-delivery on an aborted stream — is never replayed.
4. Nothing ever completes/errors the orphaned entry in `local_responder_pool` (`src/service.rs:764`). Responders are only removed on id-matched response (`:1023`), id-matched error (`:1037`), transport-send error (`:855`), or an explicit cancellation notification (`:872`) — **none of which fire here**. The worker has no `RequestId`/responder concept, so it cannot fail the request on its side either.
5. The caller's future never resolves: `call_tool` (`src/service/client.rs:365`) → `send_request` (`src/service.rs:442`) → `send_request_with_option(req, PeerRequestOptions::no_options())` → `await_response` (`:321`), whose `else` branch is `self.rx.await` (`:345`) with **no timeout**.

Net: an in-flight request whose response stream is killed by `abort_all()` during re-init → **no response, no error, unbounded await.** `reinit_on_expired_session` defaults to `true` (`:1080`), so the path is active by default. This matches the intermittent ("occasionally") nature — it only triggers when a request is in flight at the moment of the 404/re-init.

### Suggested directions (deferring to maintainers on architecture)

1. **Don't silently orphan in-flight requests across re-init.** When `abort_all()` discards streams that may carry outstanding responses, the affected `local_responder_pool` entries should be failed with a retryable/transport error (so callers get `Err` instead of hanging) — or, better, all in-flight requests should be replayed after re-init, not just the one that 404'd.
2. **Prefer recovery over abort** for the standalone SSE stream: reconnect it under the new session id (with `last_event_id`) instead of aborting and losing it.
3. **Defensive timeouts for typed methods.** `Peer::send_request_with_option` already supports `PeerRequestOptions { timeout }` (`:457`), but every typed `peer_req` method (`call_tool`, etc.) hard-codes `no_options()`. Consider a configurable default request timeout, or timeout-aware typed variants, so a lost response can't wedge a caller indefinitely.
4. **Logging nit:** the `JoinError::Cancelled` WARN at `:824` is a by-design consequence of `abort_all()`; consider downgrading/clarifying so it doesn't read as a transport error.

### Workaround (today)

Callers can avoid the typed `call_tool` and issue the request with a timeout:

```rust
use rmcp::service::PeerRequestOptions;
peer.send_request_with_option(
    /* ClientRequest::CallToolRequest(..) */ request,
    PeerRequestOptions { timeout: Some(std::time::Duration::from_secs(60)), ..PeerRequestOptions::no_options() },
).await
```

This turns the hang into a `ServiceError::Timeout` (and sends a cancellation notification), but it doesn't recover the lost request — it just bounds the wait. Downstream, rig added a per-call timeout wrapper as mitigation ([0xPlaygrounds/rig#1921](https://github.com/0xPlaygrounds/rig/pull/1921)).

### Reproduction

A fully deterministic pure-rmcp repro is awkward because it needs a server that expires the session **while a `tools/call` response is pending**. The trigger conditions are:

- client with `reinit_on_expired_session = true` (default);
- server accepts a `tools/call` (202, response to be delivered over SSE), then expires/discards the session so a subsequent request/standalone GET returns 404 → re-init → `abort_all()` kills the SSE stream that would have delivered the pending response.

Happy to attempt a minimal failing integration test against the in-tree streamable-http test server if that would help triage.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamableHttp client: transparent session re-init (HTTP 404) orphans in-flight requests, hanging `call_tool` forever #912

Summary

Symptom (from a user log)

Root-cause analysis (rmcp 1.7.0)

Suggested directions (deferring to maintainers on architecture)

Workaround (today)

Reproduction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

StreamableHttp client: transparent session re-init (HTTP 404) orphans in-flight requests, hanging call_tool forever #912

Description

Summary

Symptom (from a user log)

Root-cause analysis (rmcp 1.7.0)

Suggested directions (deferring to maintainers on architecture)

Workaround (today)

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

StreamableHttp client: transparent session re-init (HTTP 404) orphans in-flight requests, hanging `call_tool` forever #912