fix(client): send migrating_from after repeated rejoin failures#2287
fix(client): send migrating_from after repeated rejoin failures#2287jdimovska wants to merge 4 commits into
Conversation
|
📝 WalkthroughWalkthrough
ChangesSFU Migration Hints on REJOIN Retry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/client/src/Call.ts (2)
1870-1891:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the captured strategy when deciding whether a failed migration must rejoin.
doJoin()resetsthis.reconnectStrategyafter a successful SFU join. IfreconnectMigrate()fails after that point,wasMigratingcan readUNSPECIFIEDand skip the intended REJOIN fallback.Proposed fix
- const wasMigrating = - this.reconnectStrategy === WebsocketReconnectStrategy.MIGRATE; + const wasMigrating = + attemptedStrategy === WebsocketReconnectStrategy.MIGRATE;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/client/src/Call.ts` around lines 1870 - 1891, The wasMigrating check in the reconnect logic reads the current this.reconnectStrategy, but doJoin() can reset this value after a successful SFU join. If reconnectMigrate() fails after doJoin() has already executed, wasMigrating will read the new strategy instead of the original MIGRATE state, causing the intended REJOIN fallback to be skipped. Capture the this.reconnectStrategy value at the start of the reconnection attempt before doJoin() can modify it, and then use this captured strategy value in the wasMigrating condition instead of reading this.reconnectStrategy directly.
1826-1836:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat unrecoverable
SfuJoinErrors as terminal during reconnect.
join()already aborts onSfuJoinError.unrecoverable, but this reconnect catch only stops on unrecoverable coordinator errors. An unrecoverable SFU join failure can now be counted as retryable and keep cycling through migration hints until retry budgets expire.Proposed fix
- if (error instanceof ErrorFromResponse && error.unrecoverable) { + if ( + (error instanceof ErrorFromResponse && error.unrecoverable) || + (error instanceof SfuJoinError && error.unrecoverable) + ) { this.logger.warn( - `[Reconnect] Can't reconnect due to coordinator unrecoverable error`, + `[Reconnect] Can't reconnect due to unrecoverable join error`, error, ); await markAsReconnectingFailed(); return;Also applies to: 1848-1855
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/client/src/Call.ts` around lines 1826 - 1836, The reconnect logic is not treating unrecoverable SfuJoinError exceptions as terminal failures during REJOIN attempts. Modify the switchSfu condition assignment to additionally check that the SfuJoinError is not unrecoverable using SfuJoinError.unrecoverable. Ensure that unrecoverable SFU join failures are prevented from being counted as retryable by adding a negation check alongside the existing SfuJoinError.isJoinErrorCode condition, so these failures stop retry cycling immediately rather than continuing to attempt SFU migration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/client/src/Call.ts`:
- Around line 1870-1891: The wasMigrating check in the reconnect logic reads the
current this.reconnectStrategy, but doJoin() can reset this value after a
successful SFU join. If reconnectMigrate() fails after doJoin() has already
executed, wasMigrating will read the new strategy instead of the original
MIGRATE state, causing the intended REJOIN fallback to be skipped. Capture the
this.reconnectStrategy value at the start of the reconnection attempt before
doJoin() can modify it, and then use this captured strategy value in the
wasMigrating condition instead of reading this.reconnectStrategy directly.
- Around line 1826-1836: The reconnect logic is not treating unrecoverable
SfuJoinError exceptions as terminal failures during REJOIN attempts. Modify the
switchSfu condition assignment to additionally check that the SfuJoinError is
not unrecoverable using SfuJoinError.unrecoverable. Ensure that unrecoverable
SFU join failures are prevented from being counted as retryable by adding a
negation check alongside the existing SfuJoinError.isJoinErrorCode condition, so
these failures stop retry cycling immediately rather than continuing to attempt
SFU migration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aaf2a576-6ea5-4126-ab11-bb35191bdaad
📒 Files selected for processing (1)
packages/client/src/Call.ts
💡 Overview
This PR fixes an issue where a reconnecting client could get stuck rejoining the same unhealthy SFU, because the coordinator's
/joincan hand back the same SFU that failed. Failed rejoins are counted per SFU during a reconnection, and once an SFU fails enough times we setmigrating_from/migrating_from_listso the coordinator gives us a different SFU (the hint applies only to that one request, so a following FAST reconnect isn't affected).📝 Implementation notes
🎫 Ticket: https://linear.app/stream/issue/REACT-1013/enhance-reconnect-flow-with-migrating-from-option
📑 Docs: https://github.com/GetStream/docs-content/pull/
Summary by CodeRabbit
Release Notes