Skip to content

feat(stt): forward session VAD events to STT plugins#5644

Open
sam-s10s wants to merge 1 commit intolivekit:mainfrom
speechmatics:smx/vad-events
Open

feat(stt): forward session VAD events to STT plugins#5644
sam-s10s wants to merge 1 commit intolivekit:mainfrom
speechmatics:smx/vad-events

Conversation

@sam-s10s
Copy link
Copy Markdown
Contributor

@sam-s10s sam-s10s commented May 5, 2026

Currently STT providers do not recieve any external VAD events (e.g. from Silero VAD).

The proposal is to add STT.on_vad_event() hook and have AudioRecognition forward VAD events to the active STT instance, enabling plugins to react to session-level VAD (e.g. finalize on END_OF_SPEECH for externally- driven turn detection modes).

Example code:

class STT(stt.STT):

    ...

    def on_vad_event(self, ev: vad.VADEvent) -> None:
        """Auto-finalize when the session VAD reports end of speech.

        Only acts when running in EXTERNAL turn detection mode — other modes
        either delegate end-of-utterance handling to the Speechmatics service
        (ADAPTIVE, SMART_TURN, FIXED) or expect the caller to manage turns
        explicitly.
        """
        if ev.type != vad.VADEventType.END_OF_SPEECH:
            return
        if self._stt_options.turn_detection_mode != TurnDetectionMode.EXTERNAL:
            return
        self.finalize()

    ...

Add STT.on_vad_event() hook and have AudioRecognition forward VAD
events to the active STT instance, enabling plugins to react to
session-level VAD (e.g. finalize on END_OF_SPEECH for externally-
driven turn detection modes).
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines +885 to +889
if (stt_inst := self._session.stt) is not None:
try:
stt_inst.on_vad_event(ev)
except Exception:
logger.exception("error forwarding VAD event to STT")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 VAD events forwarded to session-level STT instead of the active STT instance

The code at audio_recognition.py:885 uses self._session.stt to forward VAD events, but the active STT (the one actually processing audio) is resolved by agent_activity.py:3629-3630 as self._agent.stt if is_given(self._agent.stt) else self._session.stt. When a user configures the agent with its own STT via Agent(stt=my_stt), the active STT is the agent's instance, not the session's. In this case, self._session.stt may return a different STT instance (or None if only the agent has an STT), so the on_vad_event call either reaches the wrong instance or doesn't happen at all. This means any plugin that overrides on_vad_event (the stated purpose of this PR) won't receive events when STT is set at the agent level.

Prompt for agents
The issue is in `_on_vad_event` in `audio_recognition.py`. The code forwards VAD events to `self._session.stt`, but the active STT may be the agent-level one (resolved via `agent_activity.stt` property at `agent_activity.py:3629-3630`). 

The `AudioRecognition` class currently only holds a reference to the `AgentSession` (via `self._session`), not the `AgentActivity` or `Agent`. To fix this, you could either:
1. Store a reference to the active STT instance (the `stt.STT` object, not just the `io.STTNode` callable) in `AudioRecognition` and update it when `update_stt` is called. For example, add an optional `stt_instance: stt.STT | None` parameter.
2. Have `AudioRecognition.__init__` or a new setter accept the active STT instance, and have `AgentActivity` pass `self.stt` (which correctly resolves agent vs session STT).
3. Access the active STT through the session's current activity, though this would add coupling.

The goal is to ensure `on_vad_event` is called on the same STT instance that the default `stt_node` uses (i.e., `activity.stt`).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@theomonnom
Copy link
Copy Markdown
Member

theomonnom commented May 7, 2026

Any reason why you need that? Is it required for some STT?
You can also create your own VAD instance inside your STT impl (like stt.StreamAdapter)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants