feat(rime): add WebSocket streaming TTS support#5663
Open
mcullan wants to merge 5 commits intolivekit:mainfrom
Open
feat(rime): add WebSocket streaming TTS support#5663mcullan wants to merge 5 commits intolivekit:mainfrom
mcullan wants to merge 5 commits intolivekit:mainfrom
Conversation
Adds opt-in WS streaming to the Rime TTS plugin via use_websocket=True. Pattern mirrors the Cartesia plugin: single-context JSON+base64 WS, ConnectionPool with mark_refreshed_on_get=True, blingfire sentence tokenizer, weakref.WeakSet for stream cleanup. - New SynthesizeStream class with input/send/recv task split - _connect_ws / _close_ws (eos shutdown, mirrors Deepgram) - _model_params helper consolidates the arcana/mist option-walking shared between the WS query string and the HTTP body - update_options invalidates the pool when the WS URL changes, computed via before/after _ws_url() diff - Capabilities flips streaming and aligned_transcript on with the flag - Routes to /ws3 only (mistv2 stays HTTP-only)
012c6ec to
50518df
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in WebSocket streaming to the Rime TTS plugin via a new
use_websocket=Trueconstructor argument. The existing HTTPsynthesizepath is unchanged and remains the default. When enabled, the plugin setsstreaming=Trueandaligned_transcript=Trueduring construction, opens a long-lived pooled WebSocket to Rime's/ws3endpoint, and emits word-level timestamps viapush_timed_transcript.New constructor arguments
use_websocket: bool = False— opt into the streaming path. Off by default so existing consumers see no behavior change.ws_base_url: str = "wss://users-ws.rime.ai"— overridable for self-hosted deployments, parallel to the existingbase_url.segment: NotGivenOr[str] = NOT_GIVEN— passed to Rime as a connect-time query param. Defaults to"bySentence"(server-side sentence buffering, mirrorsStreamAdaptersemantics). Pass"immediate"if the consumer is already feeding sentence-tokenized text and wants to skip server-side buffering.tokenizer: NotGivenOr[tokenize.SentenceTokenizer] = NOT_GIVEN— overridable client-side sentence tokenizer. Defaults totokenize.blingfire.SentenceTokenizer(). Mirrors the hook Cartesia exposes.Implementation
The streaming class is similar to the implementation in the Cartesia plugin: single-context JSON-envelope WebSocket, base64 PCM audio frames,
weakref.WeakSet[SynthesizeStream]for cleanup,utils.ConnectionPool[aiohttp.ClientWebSocketResponse]withmax_session_duration=300andmark_refreshed_on_get=True. Word timestamps are pushed asTimedString.Connection lifecycle:
_connect_wsopens the pooled WebSocket using the URL built from current options. Connect-time errors propagate to the outer_runexception block, which classifiesaiohttp.ClientResponseError(coveringWSServerHandshakeError) asAPIStatusErrorwith the HTTP status code preserved._close_wsfollows the graceful-shutdown pattern in the Deepgram plugin: send theeosoperation, wait one second for the server's ack, suppress-and-log any send or recv errors during teardown so they don't mask the original cause that evicted the connection from the pool.update_optionsinvalidates the pool when the WebSocket URL changes, computed via a before/after_ws_url()diff. This automatically handles model swaps, speaker swaps, and any per-model option that participates in the URL.A small
_model_params(opts)helper consolidates the per-model option walking shared between the WebSocket query string and the HTTP JSON body.Routes through
/ws3, which accepts every model the plugin supports (mistv2,mistv3,arcana). The older/ws2endpoint is not wired in.Validating
update_optionsmid-session: model swap drops the existing pooled connection and reconnects with the new URL. Verified by observing two distinct_connect_wscalls and matching audio output.APIStatusError(status_code=401)with the server message preserved, rather than a genericAPIConnectionError.tts.stream()followed byend_input()with nopush_text()raisesAPIErrorimmediately at the protocol layer rather than hanging on the receive timeout.max_session_durationwindow share the same WebSocket — no new handshake.use_websocket=False(default),synthesize()behavior is identical to before;_runpayload assembly continues to use the same_model_paramshelper plus HTTP-only fields (samplingRate,reduceLatencyformistv2).