Skip to content

fix(auth): add HTTP timeouts to bridge-token and Client endpoints#558

Merged
LinoGiger merged 2 commits into
mainfrom
poseidon/add-credential-http-timeouts
Jun 2, 2026
Merged

fix(auth): add HTTP timeouts to bridge-token and Client endpoints#558
LinoGiger merged 2 commits into
mainfrom
poseidon/add-credential-http-timeouts

Conversation

@RapidPoseidon
Copy link
Copy Markdown
Contributor

Summary

_get_bridge_tokens, _poll_read_key, and _create_client in CredentialManager all hit identity endpoints with requests.* and no timeout=. A slow/hung identity server or a deliberate slowloris holds the calling thread forever.

poll_timeout caps the total polling loop only — a hang on any individual request still blocks forever. The bridge and create-client calls had no cap whatsoever.

Fix

Single _HTTP_TIMEOUT = (10.0, 30.0) class constant (connect, read), threaded through all three call sites.

Test plan

  • uv run pyright src/rapidata/rapidata_client → 0 errors
  • All requests.* call sites in the file now have timeout=.

🔗 Session: https://session-bc38cc85.poseidon.rapidata.internal/

`_get_bridge_tokens`, `_poll_read_key`, and `_create_client` all called
`requests.*` without a `timeout=` argument. A slow or hung identity
server — or a slowloris — would pin the calling thread indefinitely.
The existing `poll_timeout` only caps the total polling loop, not
individual requests, and the bridge/create-client calls had no cap at
all.

Add a shared `(connect=10s, read=30s)` pair and thread it through all
three call sites.

Session: https://session-bc38cc85.poseidon.rapidata.internal/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: lino <lino@rapidata.ai>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review

Overview

This PR correctly identifies and fixes a real reliability issue: all three requests.* call sites in CredentialManager lacked HTTP timeouts, meaning a slow or hung identity server could pin the calling thread indefinitely. The fix is minimal and well-scoped.


Strengths

  • Correct approach: Using a (connect, read) tuple is the right pattern for requests — it independently caps the TCP handshake and the response read, which is strictly better than a single scalar timeout.
  • Single constant: Centralising the value as _HTTP_TIMEOUT means it can be adjusted in one place. Avoids scattered magic numbers.
  • All call sites covered: The diff touches all three requests.* calls in the file. No omissions.
  • Reasonable values: 10 s connect / 30 s read is appropriate for interactive auth flows.

Issues

1. Multi-line comment violates project conventions (minor)

CLAUDE.md says: "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max."

The 4-line block on _HTTP_TIMEOUT should be trimmed. The why is still worth keeping, but in one line:

# (connect, read) pair; outer poll_timeout only bounds the loop, not individual requests.
_HTTP_TIMEOUT: Tuple[float, float] = (10.0, 30.0)

2. Polling loop can exceed poll_timeout by up to N × 40 s (worth noting)

In _poll_read_key, the guard is:

while time.time() - start_time < self.poll_timeout:

But each iteration can block for up to connect_timeout + read_timeout = 40 s before the loop condition is re-evaluated. If the server consistently responds slowly, the actual wall-clock time becomes poll_timeout + (number_of_slow_requests × 40 s). With the default poll_timeout=300 s this is bounded — far better than the previous infinite hang — but callers assuming a hard cap of poll_timeout seconds will be surprised.

A low-cost improvement would be to pass the remaining budget as the read timeout:

remaining = self.poll_timeout - (time.time() - start_time)
if remaining <= 0:
    break
read_timeout = min(self._HTTP_TIMEOUT[1], remaining)
response = requests.get(..., timeout=(self._HTTP_TIMEOUT[0], read_timeout))

This is not a blocker, but worth a follow-up issue or a note in the docstring.

3. A single timeout on _get_bridge_tokens / _create_client aborts the entire auth flow (existing behaviour, worth noting)

requests.Timeout is caught by the except requests.RequestException blocks and returns None, which propagates as an auth failure. A transient 10 s connect stall on a cold server would fail the whole login. The current PR does not make this worse — it was already the case that any RequestException would abort — but adding a one-retry on Timeout specifically would improve resilience. Not required for this PR.

4. No automated tests

The test plan is manual (pyright + eyeball). A unit test using unittest.mock.patch on requests.post / requests.get to assert that timeout= is always forwarded would lock in this fix and prevent regression. Given the small scope of the change, even a single parametrised test would be sufficient.


Summary

The fix is correct and the values are sensible. The two actionable items before merge are:

  1. Trim the comment to one line (required per CLAUDE.md).
  2. Consider bounding the polling-loop read timeout to the remaining budget (optional but recommended).

Everything else is pre-existing behaviour or a follow-up. Good catch on the missing timeouts.

@LinoGiger LinoGiger merged commit f632e6a into main Jun 2, 2026
2 checks passed
@LinoGiger LinoGiger deleted the poseidon/add-credential-http-timeouts branch June 2, 2026 09:37
@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Code Review for PR 558 - fix(auth): add HTTP timeouts to bridge-token and Client endpoints

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Code Review — fix(auth): add HTTP timeouts to bridge-token and Client endpoints

Overview

This PR adds a (connect=10s, read=30s) timeout to all three bare requests.* calls in CredentialManager that previously had no timeout protection. The fix is correct and addresses a genuine reliability/security risk.


What is good

  • Correct primitive: Using a (connect, read) tuple rather than a single scalar is the right choice — it distinguishes network connection delays from server response delays.
  • Single source of truth: The class constant _HTTP_TIMEOUT means the value only needs changing in one place.
  • Complete coverage: All three call sites (_get_bridge_tokens, _poll_read_key, _create_client) are updated.
  • The class-level comment explaining the poll_timeout vs per-request timeout invariant is genuinely non-obvious — worth keeping.

Issues

Bug: a single request timeout terminates the entire polling loop

In _poll_read_key, requests.ReadTimeout is a subclass of requests.RequestException, so the existing handler catches it and immediately returns None, aborting all remaining retries. Before this PR a hung server would block forever; after this PR a single slow response during polling kills the whole auth flow. The fix should distinguish transient timeouts from hard errors:

except requests.Timeout:
    logger.warning('Poll request timed out, retrying...')
    time.sleep(self.poll_interval)
    continue
except requests.RequestException as e:
    logger.error('Error polling read key: %s', e)
    return None

Read timeout of 30s is generous for a polling endpoint

The /identity/bridge-token GET is a lightweight status check that should return 200 or 202 immediately. A 30s read timeout means a hung server during polling holds the thread for up to 30s per iteration. Consider a tighter read timeout for the polling path (e.g. 5-10s) and reserve the longer value for _create_client which may do heavier server-side work. You could also use two constants: _POLL_TIMEOUT and _CLIENT_TIMEOUT.

Minor: URL construction moved inside try unnecessarily

String formatting cannot raise requests.RequestException, so moving bridge_endpoint inside the try block in _get_bridge_tokens adds noise without benefit. Not a bug, just a clarity note.


Convention notes

  • The multi-line class-level comment technically conflicts with CLAUDE.md's 'no multi-line comment blocks' guidance, but the WHY here is genuinely non-obvious. I would keep it as-is.
  • The file uses 'from typing import Tuple' without 'from future import annotations'. The PR does not introduce this, but a follow-up cleanup would align with CLAUDE.md convention.

Test coverage

No tests are added. The polling exception handler bug is exactly what a test would catch — a mock that raises requests.Timeout on the first poll and succeeds on the second would confirm retry behavior. Worth adding given this is the auth critical path.


Summary

The core fix is correct and needed. The main actionable concern is the polling exception handler: a single request timeout now terminates auth entirely rather than retrying within the remaining poll_timeout window, which could make the auth flow less reliable in practice than it was before this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants