Skip to content

fix: recoverable client on connection failure#252

Merged
pinodeca merged 1 commit into
microsoft:mainfrom
iemejia:fix/client-recovery
Jun 18, 2026
Merged

fix: recoverable client on connection failure#252
pinodeca merged 1 commit into
microsoft:mainfrom
iemejia:fix/client-recovery

Conversation

@iemejia

@iemejia iemejia commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Split from #221 to ease review.

Replace OnceLock<Client> with thread_local RefCell<Option<Client>> that auto-resets on connection errors (broken pipe, pool timeout, connection refused, etc.). Subsequent calls transparently re-create the client pool instead of failing permanently for the session's lifetime.

Add is_connection_error() heuristic with unit tests and pg_test integration coverage.

Files: src/client.rs, src/lib.rs
Tests: 6 unit tests + 1 pg_test

@pinodeca pinodeca left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with Opus 4.8

Suggestions (non-blocking)

  1. Unnecessary String allocations. The new .to_string() conversions (inst_id, fn_name, inp, evt_name, etc.) aren't needed. rt.block_on(...) runs synchronously within the function scope, so the closure can capture the original &str arguments directly. This adds an allocation per call on the hot path of df.start()/df.signal()/df.cancel(). Minor, but easy to drop.

  2. is_connection_error is broad. Matching bare substrings "connection" and "closed" can produce false positives (e.g. a business error mentioning "connection" or "closed"). The consequence is benign — just an extra pool re-creation — but worth a one-line comment noting that false positives are acceptable-by-design (cheap recovery, never incorrect). Anchoring on phrases like "connection refused"/"connection reset"/"connection closed" would tighten it if you'd prefer.

  3. Reentrancy caveat. The immutable borrow is held across f's entire block_on. Today none of the three callers re-enter with_duroxide_client, so it's safe, but a future caller that did would panic. A short comment on with_duroxide_client warning against reentrant calls would protect maintainers.

  4. Removed set_var unsafety note. The original comment warning that std::env::set_var becomes unsafe in the 2024 edition was dropped. The crate is edition = "2021" so this is fine today, but that note was a useful landmine marker for a future edition bump — consider keeping it.

  5. Test redundancy. The pg_test (test_is_connection_error_detects_failures) largely duplicates the #[cfg(test)] unit tests. Not a problem — just flagging that the is_connection_error_for_test wrapper exists solely to expose the fn through the pg_test harness.

Verdict

Approve with minor nits. None of the above is blocking; #1 (drop the allocations) and a comment for #2/#3 would be the highest-value follow-ups.

Replace OnceLock<Client> with thread_local RefCell<Option<Client>> that
auto-resets on connection errors (broken pipe, pool timeout, connection
refused, etc.). Subsequent calls transparently re-create the client pool
instead of failing permanently for the session's lifetime.

Add is_connection_error() heuristic with unit tests and pg_test
integration coverage.
@iemejia iemejia force-pushed the fix/client-recovery branch from 57400ac to ee89db7 Compare June 18, 2026 16:19
@pinodeca pinodeca merged commit d8396ad into microsoft:main Jun 18, 2026
5 checks passed
@iemejia iemejia deleted the fix/client-recovery branch June 18, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants