Skip to content

feat(processing): add distributed session coordination (closes #216)#303

Merged
Xhristin3 merged 1 commit into
XStreamRollz:mainfrom
GWorld57:feat/216-distributed-session-coordination
Jun 20, 2026
Merged

feat(processing): add distributed session coordination (closes #216)#303
Xhristin3 merged 1 commit into
XStreamRollz:mainfrom
GWorld57:feat/216-distributed-session-coordination

Conversation

@GWorld57

Copy link
Copy Markdown
Contributor

Implements #216.

When horizontally scaling the worker, two pods polling the API in parallel would otherwise register StreamSessions for the same streamId and double-publish every event. This change wires a LockManager in front of SessionRegistry so each stream is claimed atomically before a session is created.

Two backends ship:

  • MemoryLockManager (default, per-process) — no infra cost for tests + single-replica deployments.
  • PostgresLockManager (LOCK_BACKEND=postgres) — fronts a small stream_locks table that the worker bootstraps on startup. Acquisition is a single atomic UPSERT; renewal is a guarded UPDATE; release/eviction handle TTL expiry for crashed workers without a separate Redis broker.

Notes

  • LOCK_BACKEND, DATABASE_URL, and LOCK_TTL_MS added to .env validation and .env.example.
  • pg and @types/pg added as worker deps.
  • SessionRegistry.route() is now async; worker.ts pollOnce() wraps the call in try/catch + continue so a transient coordinator failure cannot kill the worker.
  • worker.ts pollOnce() deduplicates identical routing errors within a 30s window (LRU-capped at 100 keys by oldest-lastLoggedAtMs) so a sustained DB outage does not flood stderr.

Verification

  • tsc --noEmit clean.
  • jest --runInBand: 11 suites, 88 tests, all green.

Closes #216

)

Wire a LockManager in front of SessionRegistry so two worker pods
never process the same stream concurrently. Two backends ship:

- MemoryLockManager (default, per-process) — keeps unit + integration
  tests deterministic without a DB.
- PostgresLockManager — fronts a stream_locks table with a single
  atomic UPSERT for acquisition, a guarded UPDATE for heartbeat
  renewals, and deletes on release.

Locks are acquired before the placeholder session is exposed in
the local map; each lock is heartbeated at TTL/3 ms; a lost
renewal fails the session so the stream can be picked up by
another worker once the TTL elapses.

Closes XStreamRollz#216

@Xhristin3 Xhristin3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big change, but the abstraction (memory vs postgres backends behind one LockManager interface) is clean, and the new tests cover both. The pnpm-lock bump matches the new pg dependency. Approving and merging.

@Xhristin3 Xhristin3 merged commit 82505e7 into XStreamRollz:main Jun 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add distributed session coordination for horizontal worker scaling

2 participants