fix(client): send migrating_from after repeated rejoin failures by jdimovska · Pull Request #2287 · GetStream/stream-video-js

jdimovska · 2026-06-16T22:08:17Z

💡 Overview

This PR fixes an issue where a reconnecting client could get stuck rejoining the same unhealthy SFU, because the coordinator's /join can hand back the same SFU that failed. Failed rejoins are counted per SFU during a reconnection, and once an SFU fails enough times we set migrating_from / migrating_from_list so the coordinator gives us a different SFU (the hint applies only to that one request, so a following FAST reconnect isn't affected).

📝 Implementation notes

🎫 Ticket: https://linear.app/stream/issue/REACT-1013/enhance-reconnect-flow-with-migrating-from-option

📑 Docs: https://github.com/GetStream/docs-content/pull/

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved call reconnection reliability by refining retry behavior and tracking repeated SFU join failures to make smarter rejoin decisions.
- Enhanced reconnection handling to reliably include migration hints during rejoin retries, improving recovery after SFU capacity issues while keeping call join parameters unchanged.
- Strengthened reconnection test coverage to validate the updated rejoin and SFU exclusion behavior across multiple retries.

changeset-bot · 2026-06-16T22:08:20Z

⚠️ No Changeset found

Latest commit: 730e758

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-06-16T22:08:26Z

📝 Walkthrough

Walkthrough

Call.reconnect() gains a per-invocation sfuRejoinFailures map that tracks REJOIN join failures by SFU edge name. On each REJOIN attempt, confirmed-bad SFUs (≥ 2 failures) are temporarily injected as migrating_from/migrating_from_list into joinCallData, then always removed in a finally block. Errors increment the per-SFU count, with SfuJoinError codes that should trigger switching forcing the count to at least 2. The source of migratingFromSfuId in doJoin is switched from data.migrating_from to previousSfuClient.edgeName. Tests cover repeated rejoin failures, SFU_FULL errors, and the two-tries-before-exclusion sequencing.

Changes

SFU Migration Hints on REJOIN Retry

Layer / File(s)	Summary
Per-SFU failure tracking and migration hint injection `packages/client/src/Call.ts`	Initializes `sfuRejoinFailures` map per `reconnect()` call; snapshots `reconnectStrategy` into `attemptedStrategy`/`currentStrategy` each loop iteration; temporarily sets `migrating_from`/`migrating_from_list` on `joinCallData` in the `REJOIN` branch for SFUs with ≥ 2 failures, clearing them in a `finally` block; increments the per-SFU failure count on error with `SfuJoinError`-aware threshold forcing; sources `migratingFromSfuId` from `previousSfuClient.edgeName` instead of `data.migrating_from` during `doJoin` reconnectDetails preparation.
Reconnect test helper update and migration hint tests `packages/client/src/rtc/__tests__/Call.reconnect.test.ts`	Adds `reportingEnabled` option to `makeCall` and imports `SfuJoinError`; introduces three new tests asserting `migrating_from`/`migrating_from_list` propagation to `doJoinRequest` after repeated REJOIN failures, after `SfuJoinError` `SFU_FULL`, and across the two-tries-before-exclusion SFU sequencing, each confirming `joinCallData` stays `{ create: true }`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

oliverlaz

🐇 Hop hop, the SFU's full today,
Two strikes and you're out, bunny says nay!
migrating_from flies through the air,
finally cleans up without a care.
The tests all pass — a joyful display! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: sending migrating_from hints after repeated rejoin failures to avoid getting stuck on unhealthy SFUs.
Description check	✅ Passed	The description includes all required template sections with clear explanation of the problem, solution, and ticket reference.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch migrate-during-reconnect

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/client/src/Call.ts (2)

1870-1891: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the captured strategy when deciding whether a failed migration must rejoin.

doJoin() resets this.reconnectStrategy after a successful SFU join. If reconnectMigrate() fails after that point, wasMigrating can read UNSPECIFIED and skip the intended REJOIN fallback.

Proposed fix

-          const wasMigrating =
-            this.reconnectStrategy === WebsocketReconnectStrategy.MIGRATE;
+          const wasMigrating =
+            attemptedStrategy === WebsocketReconnectStrategy.MIGRATE;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/client/src/Call.ts` around lines 1870 - 1891, The wasMigrating check
in the reconnect logic reads the current this.reconnectStrategy, but doJoin()
can reset this value after a successful SFU join. If reconnectMigrate() fails
after doJoin() has already executed, wasMigrating will read the new strategy
instead of the original MIGRATE state, causing the intended REJOIN fallback to
be skipped. Capture the this.reconnectStrategy value at the start of the
reconnection attempt before doJoin() can modify it, and then use this captured
strategy value in the wasMigrating condition instead of reading
this.reconnectStrategy directly.

1826-1836: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat unrecoverable SfuJoinErrors as terminal during reconnect.

join() already aborts on SfuJoinError.unrecoverable, but this reconnect catch only stops on unrecoverable coordinator errors. An unrecoverable SFU join failure can now be counted as retryable and keep cycling through migration hints until retry budgets expire.

Proposed fix

-          if (error instanceof ErrorFromResponse && error.unrecoverable) {
+          if (
+            (error instanceof ErrorFromResponse && error.unrecoverable) ||
+            (error instanceof SfuJoinError && error.unrecoverable)
+          ) {
             this.logger.warn(
-              `[Reconnect] Can't reconnect due to coordinator unrecoverable error`,
+              `[Reconnect] Can't reconnect due to unrecoverable join error`,
               error,
             );
             await markAsReconnectingFailed();
             return;

Also applies to: 1848-1855

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/client/src/Call.ts` around lines 1826 - 1836, The reconnect logic is
not treating unrecoverable SfuJoinError exceptions as terminal failures during
REJOIN attempts. Modify the switchSfu condition assignment to additionally check
that the SfuJoinError is not unrecoverable using SfuJoinError.unrecoverable.
Ensure that unrecoverable SFU join failures are prevented from being counted as
retryable by adding a negation check alongside the existing
SfuJoinError.isJoinErrorCode condition, so these failures stop retry cycling
immediately rather than continuing to attempt SFU migration.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/client/src/Call.ts`:
- Around line 1870-1891: The wasMigrating check in the reconnect logic reads the
current this.reconnectStrategy, but doJoin() can reset this value after a
successful SFU join. If reconnectMigrate() fails after doJoin() has already
executed, wasMigrating will read the new strategy instead of the original
MIGRATE state, causing the intended REJOIN fallback to be skipped. Capture the
this.reconnectStrategy value at the start of the reconnection attempt before
doJoin() can modify it, and then use this captured strategy value in the
wasMigrating condition instead of reading this.reconnectStrategy directly.
- Around line 1826-1836: The reconnect logic is not treating unrecoverable
SfuJoinError exceptions as terminal failures during REJOIN attempts. Modify the
switchSfu condition assignment to additionally check that the SfuJoinError is
not unrecoverable using SfuJoinError.unrecoverable. Ensure that unrecoverable
SFU join failures are prevented from being counted as retryable by adding a
negation check alongside the existing SfuJoinError.isJoinErrorCode condition, so
these failures stop retry cycling immediately rather than continuing to attempt
SFU migration.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aaf2a576-6ea5-4126-ab11-bb35191bdaad

📥 Commits

Reviewing files that changed from the base of the PR and between 67c49bb and 730e758.

📒 Files selected for processing (1)

packages/client/src/Call.ts

fix(client): send migrating_from after repeated rejoin failures

0bc138a

jdimovska added 2 commits June 17, 2026 11:03

fix(client): send migrating_from after repeated rejoin failures

cdb1ea6

fix(client): send migrating_from after repeated rejoin failures

67c49bb

jdimovska marked this pull request as ready for review June 17, 2026 09:20

fix(client): send migrating_from after repeated rejoin failures

730e758

jdimovska requested a review from oliverlaz June 17, 2026 10:03

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(client): send migrating_from after repeated rejoin failures#2287

fix(client): send migrating_from after repeated rejoin failures#2287
jdimovska wants to merge 4 commits into
mainfrom
migrate-during-reconnect

jdimovska commented Jun 16, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdimovska commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💡 Overview

📝 Implementation notes

Summary by CodeRabbit

Release Notes

Uh oh!

changeset-bot Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jdimovska commented Jun 16, 2026 •

edited

Loading

changeset-bot Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading