Follow-up to livekit/client-sdk-react-native#389, where @jankosecki traced this deadlock and offered to file it here. We hit the same bug in production with the default configuration, so we're filing it with our data. All file/line references below are against the 144.1.0 tag.
Environment
@livekit/react-native-webrtc 144.1.0, @livekit/react-native 2.11.0
- React Native 0.85.3 (New Architecture), Expo SDK 56
- iOS, physical device, production build
- Default audio setup: stock
registerGlobals(), no autoConfigureAudioSession changes, no manual audio session configuration
Mechanism
Three pieces interact:
ios/RCTWebRTC/AudioDeviceModuleObserver.m blocks the native thread driving RTCAudioDeviceModule delegate callbacks with dispatch_semaphore_wait(..., DISPATCH_TIME_FOREVER) while waiting for a JS reply — six sites: L61 (didCreateEngine), L82 (willEnableEngine), L107 (willStartEngine), L128 (didStopEngine), L149 (didDisableEngine), L161 (willReleaseEngine). The reply only arrives if the JS thread is free to run the event listener and call the corresponding audioDeviceModuleResolve* method (WebRTCModule+RTCAudioDeviceModule.m:232-258).
- The observer is installed unconditionally in
WebRTCModule.m:98-99, so every app pays the blocking-wait cost even if it never registers JS-side audio engine hooks.
- Several peer-connection methods are blocking-synchronous JS→native calls that
dispatch_sync onto the serial workerQueue — notably peerConnectionAddTransceiver (WebRTCModule+RTCPeerConnection.m:534, dispatch_sync at L539), which holds the JS thread until the queue block returns.
If an AVAudioEngine stop/restart (e.g. a mic publish flipping the engine from playout-only to duplex) is in flight when JS enters peerConnectionAddTransceiver, the workerQueue block stalls inside libwebrtc on state held by the restart, the restart's delegate callback waits forever for a JS reply, and JS is parked inside the synchronous bridge call. Circular wait — JS → workerQueue → libwebrtc/ADM state → delegate semaphore → JS — and nothing ever proceeds.
Reproduction shape
Publish a microphone track and then a camera track back-to-back right after Connected (a voice+video call against a LiveKit agent, in our case). The mic publish triggers the playout-only → duplex engine restart; the camera publish issues the synchronous addTransceiver. The race window is a few milliseconds, so it's intermittent — most calls are fine, then one freezes. livekit/client-sdk-react-native#389 has the mirror-image trace from the subscribe side (remote audio+video tracks on join).
Production occurrence
UTC, from our backend records of the frozen call:
| Time |
Event |
| 01:46:33.116 |
Last JS-side analytics write — JS thread provably alive |
| 01:46:34.642 |
Agent subscribed our microphone track (mic publish completed) |
| — |
Camera track never published |
| — |
room.disconnect() never ran; user force-killed the app |
In a healthy call by the same user two minutes earlier, the mic→camera publish gap was 35 ms.
Impact
Permanent app-wide UI freeze: the JS thread never returns, every React Native touchable is dead, and no crash report is generated since it's a deadlock rather than a crash. The user's only recourse is force-killing the app, and because room.disconnect() can never run, the session also lingers server-side.
Proposed fix
Bound the six waits (e.g. dispatch_time(DISPATCH_TIME_NOW, 2 * NSEC_PER_SEC)), drain any stale signal left on the long-lived semaphore by a previously timed-out round before sending the event, and on timeout emit an os_log error and return the default 0 so the engine operation degrades gracefully instead of freezing the app. We're running exactly this as a local patch in production. Happy to open a PR if the approach is acceptable.
Follow-up to livekit/client-sdk-react-native#389, where @jankosecki traced this deadlock and offered to file it here. We hit the same bug in production with the default configuration, so we're filing it with our data. All file/line references below are against the
144.1.0tag.Environment
@livekit/react-native-webrtc144.1.0,@livekit/react-native2.11.0registerGlobals(), noautoConfigureAudioSessionchanges, no manual audio session configurationMechanism
Three pieces interact:
ios/RCTWebRTC/AudioDeviceModuleObserver.mblocks the native thread drivingRTCAudioDeviceModuledelegate callbacks withdispatch_semaphore_wait(..., DISPATCH_TIME_FOREVER)while waiting for a JS reply — six sites: L61 (didCreateEngine), L82 (willEnableEngine), L107 (willStartEngine), L128 (didStopEngine), L149 (didDisableEngine), L161 (willReleaseEngine). The reply only arrives if the JS thread is free to run the event listener and call the correspondingaudioDeviceModuleResolve*method (WebRTCModule+RTCAudioDeviceModule.m:232-258).WebRTCModule.m:98-99, so every app pays the blocking-wait cost even if it never registers JS-side audio engine hooks.dispatch_synconto the serialworkerQueue— notablypeerConnectionAddTransceiver(WebRTCModule+RTCPeerConnection.m:534,dispatch_syncat L539), which holds the JS thread until the queue block returns.If an AVAudioEngine stop/restart (e.g. a mic publish flipping the engine from playout-only to duplex) is in flight when JS enters
peerConnectionAddTransceiver, theworkerQueueblock stalls inside libwebrtc on state held by the restart, the restart's delegate callback waits forever for a JS reply, and JS is parked inside the synchronous bridge call. Circular wait — JS → workerQueue → libwebrtc/ADM state → delegate semaphore → JS — and nothing ever proceeds.Reproduction shape
Publish a microphone track and then a camera track back-to-back right after
Connected(a voice+video call against a LiveKit agent, in our case). The mic publish triggers the playout-only → duplex engine restart; the camera publish issues the synchronousaddTransceiver. The race window is a few milliseconds, so it's intermittent — most calls are fine, then one freezes. livekit/client-sdk-react-native#389 has the mirror-image trace from the subscribe side (remote audio+video tracks on join).Production occurrence
UTC, from our backend records of the frozen call:
room.disconnect()never ran; user force-killed the appIn a healthy call by the same user two minutes earlier, the mic→camera publish gap was 35 ms.
Impact
Permanent app-wide UI freeze: the JS thread never returns, every React Native touchable is dead, and no crash report is generated since it's a deadlock rather than a crash. The user's only recourse is force-killing the app, and because
room.disconnect()can never run, the session also lingers server-side.Proposed fix
Bound the six waits (e.g.
dispatch_time(DISPATCH_TIME_NOW, 2 * NSEC_PER_SEC)), drain any stale signal left on the long-lived semaphore by a previously timed-out round before sending the event, and on timeout emit anos_logerror and return the default0so the engine operation degrades gracefully instead of freezing the app. We're running exactly this as a local patch in production. Happy to open a PR if the approach is acceptable.