feat(stt): forward session VAD events to STT plugins by sam-s10s · Pull Request #5644 · livekit/agents

sam-s10s · 2026-05-05T11:54:40Z

Currently STT providers do not recieve any external VAD events (e.g. from Silero VAD).

The proposal is to add STT.on_vad_event() hook and have AudioRecognition forward VAD events to the active STT instance, enabling plugins to react to session-level VAD (e.g. finalize on END_OF_SPEECH for externally- driven turn detection modes).

Example code:

class STT(stt.STT):

    ...

    def on_vad_event(self, ev: vad.VADEvent) -> None:
        """Auto-finalize when the session VAD reports end of speech.

        Only acts when running in EXTERNAL turn detection mode — other modes
        either delegate end-of-utterance handling to the Speechmatics service
        (ADAPTIVE, SMART_TURN, FIXED) or expect the caller to manage turns
        explicitly.
        """
        if ev.type != vad.VADEventType.END_OF_SPEECH:
            return
        if self._stt_options.turn_detection_mode != TurnDetectionMode.EXTERNAL:
            return
        self.finalize()

    ...

Add STT.on_vad_event() hook and have AudioRecognition forward VAD events to the active STT instance, enabling plugins to react to session-level VAD (e.g. finalize on END_OF_SPEECH for externally- driven turn detection modes).

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-05-05T11:59:17Z

+        if (stt_inst := self._session.stt) is not None:
+            try:
+                stt_inst.on_vad_event(ev)
+            except Exception:
+                logger.exception("error forwarding VAD event to STT")


🟡 VAD events forwarded to session-level STT instead of the active STT instance

The code at audio_recognition.py:885 uses self._session.stt to forward VAD events, but the active STT (the one actually processing audio) is resolved by agent_activity.py:3629-3630 as self._agent.stt if is_given(self._agent.stt) else self._session.stt. When a user configures the agent with its own STT via Agent(stt=my_stt), the active STT is the agent's instance, not the session's. In this case, self._session.stt may return a different STT instance (or None if only the agent has an STT), so the on_vad_event call either reaches the wrong instance or doesn't happen at all. This means any plugin that overrides on_vad_event (the stated purpose of this PR) won't receive events when STT is set at the agent level.

Prompt for agents

The issue is in `_on_vad_event` in `audio_recognition.py`. The code forwards VAD events to `self._session.stt`, but the active STT may be the agent-level one (resolved via `agent_activity.stt` property at `agent_activity.py:3629-3630`). The `AudioRecognition` class currently only holds a reference to the `AgentSession` (via `self._session`), not the `AgentActivity` or `Agent`. To fix this, you could either: 1. Store a reference to the active STT instance (the `stt.STT` object, not just the `io.STTNode` callable) in `AudioRecognition` and update it when `update_stt` is called. For example, add an optional `stt_instance: stt.STT | None` parameter. 2. Have `AudioRecognition.__init__` or a new setter accept the active STT instance, and have `AgentActivity` pass `self.stt` (which correctly resolves agent vs session STT). 3. Access the active STT through the session's current activity, though this would add coupling. The goal is to ensure `on_vad_event` is called on the same STT instance that the default `stt_node` uses (i.e., `activity.stt`).

Was this helpful? React with 👍 or 👎 to provide feedback.

theomonnom · 2026-05-07T06:07:10Z

Any reason why you need that? Is it required for some STT?
You can also create your own VAD instance inside your STT impl (like stt.StreamAdapter)

feat(stt): forward session VAD events to STT plugins

dc0a403

Add STT.on_vad_event() hook and have AudioRecognition forward VAD events to the active STT instance, enabling plugins to react to session-level VAD (e.g. finalize on END_OF_SPEECH for externally- driven turn detection modes).

devin-ai-integration Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): forward session VAD events to STT plugins#5644

feat(stt): forward session VAD events to STT plugins#5644
sam-s10s wants to merge 1 commit intolivekit:mainfrom
speechmatics:smx/vad-events

sam-s10s commented May 5, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 5, 2026

Uh oh!

theomonnom commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sam-s10s commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

theomonnom commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sam-s10s commented May 5, 2026 •

edited

Loading

theomonnom commented May 7, 2026 •

edited

Loading