Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
72b6f6b
fix(rs-dapi-client): rotate instead of ban on ResourceExhausted rate-…
lklimek Jun 22, 2026
fadedd4
Merge branch 'v3.1-dev' into fix/rs-dapi-client-rate-limit-rotate
lklimek Jun 22, 2026
396865f
fix(rs-dapi-client): exclude throttled node + backoff/jitter on rate-…
lklimek Jun 22, 2026
b8a8c32
fix(rs-dapi-client): clamp rate-limit backoff shift + symmetric invar…
lklimek Jun 22, 2026
8a41c92
fix(rs-dapi-client): replace rotate-on-rate-limit with Envoy-driven b…
lklimek Jun 23, 2026
c905b16
refactor(rs-dapi-client): drive rate-limit ban from Envoy reset heade…
lklimek Jun 23, 2026
dade457
test(rs-dapi-client): apply QA-001..005 doc-accuracy and test-honesty…
lklimek Jun 23, 2026
2c76a14
Merge branch 'v3.1-dev' into fix/rs-dapi-client-rate-limit-rotate
lklimek Jun 23, 2026
47f4160
fix(rs-dapi-client): apply PR-3951 review fixes — ban_for max-semanti…
lklimek Jun 24, 2026
be6ae84
test(rs-dapi-client): restore genuine window-expiry coverage in ban_f…
lklimek Jun 24, 2026
41cc076
docs(rs-dapi-client): QA-006/007/008 — ban_with_reason scope note + n…
lklimek Jun 24, 2026
ddfe16c
docs(rs-dapi-client): tighten ban_for/ban_with_reason scope docs; har…
lklimek Jun 24, 2026
caa025e
feat(dashmate): add platform.gateway.rateLimiter.responseHeaders.enab…
lklimek Jun 24, 2026
c49adb2
fix(dashmate): key responseHeaders migration at 4.0.0 not released rc.2
lklimek Jun 24, 2026
458a28e
test(rs-dapi-client): prove ban_for via DapiClient::execute end-to-en…
lklimek Jun 24, 2026
b23008e
fix(dashmate): key responseHeaders migration at next release 4.0.0-rc…
lklimek Jun 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions packages/dashmate/configs/defaults/getBaseConfigFactory.js
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,9 @@ export default function getBaseConfigFactory() {
blacklist: [],
whitelist: [],
enabled: true,
responseHeaders: {
enabled: true,
},
},
ssl: {
enabled: false,
Expand Down
19 changes: 19 additions & 0 deletions packages/dashmate/configs/getConfigFileMigrationsFactory.js
Original file line number Diff line number Diff line change
Expand Up @@ -1520,6 +1520,25 @@ export default function getConfigFileMigrationsFactory(homeDir, defaultConfigs)

return configFile;
},
'4.0.0-rc.3': (configFile) => {
Object.entries(configFile.configs)
.forEach(([, options]) => {
// Add responseHeaders toggle to rate limiter (default true so existing
// deployments keep emitting RateLimit-* headers; rs-dapi-client depends
// on RateLimit-Reset to apply precise ban windows instead of the
// exponential health-ban ladder).
// Keyed at the next release (4.0.0-rc.3), not the already-released
// rc.2: the runner skips fromVersion===toVersion, so a key equal to
// an operator's current version never fires. Backfill runs once the
// package bumps to rc.3 (mirrors the 3.1.0 migration added at 3.1.0-dev.1).
if (options.platform?.gateway?.rateLimiter
&& typeof options.platform.gateway.rateLimiter.responseHeaders === 'undefined') {
options.platform.gateway.rateLimiter.responseHeaders = base.get('platform.gateway.rateLimiter.responseHeaders');
}
});

return configFile;
},
Comment on lines +1523 to +1541

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Existing 4.0.0-rc.2 configs break: required responseHeaders is queued behind a 4.0.0-rc.3 migration but package.json is still rc.2

This commit re-keys the responseHeaders backfill from 4.0.0 to 4.0.0-rc.3, but packages/dashmate/package.json still reports 4.0.0-rc.2. ConfigFileJsonRepository.read() calls migrateConfigFile(configFileData, configFileData.configFormatVersion, version) with version read from package.json; migrateConfigFileFactory early-returns at fromVersion === toVersion (packages/dashmate/src/config/configFile/migrateConfigFileFactory.js:12-14). So for an operator already on rc.2, fromVersion === toVersion === 4.0.0-rc.2 and the new migration never runs.

Meanwhile this PR makes responseHeaders a required field of the rate-limiter schema (packages/dashmate/src/config/configJsonSchema.js:705,708), and the per-config schema is enforced by new Config(name, opts, skipValidation) inside ConfigFileJsonRepository.read(). The result: any existing rc.2 deployment that pulls this PR before package.json bumps to rc.3 fails AJV validation on read with a missing responseHeaders error, even though the migration that would have backfilled it is sitting one version ahead.

The comment correctly documents the design intent ("runs once the package bumps to rc.3"), so the change is internally consistent — but the coupling to the rc.3 version bump must land in the same PR. Either bump packages/dashmate/package.json to 4.0.0-rc.3 in this PR, or temporarily drop responseHeaders from the schema required list so existing rc.2 configs still validate until the migration can run.

source: ['claude', 'codex']

};
}

Expand Down
5 changes: 5 additions & 0 deletions packages/dashmate/docker-compose.rate_limiter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ services:
- GRPC_MAX_CONNECTION_AGE=1h
- GRPC_MAX_CONNECTION_AGE_GRACE=10m
- GRPC_PORT=8081
# Emit RateLimit-Limit / RateLimit-Remaining / RateLimit-Reset response
# headers so rs-dapi-client can read the exact reset window and ban the
# node for that duration instead of the exponential health-ban ladder.
# Controlled by platform.gateway.rateLimiter.responseHeaders.enabled.
- LIMIT_RESPONSE_HEADERS_ENABLED=${PLATFORM_GATEWAY_RATE_LIMITER_RESPONSE_HEADERS_ENABLED:?err}
expose:
- 8081
profiles:
Expand Down
1 change: 1 addition & 0 deletions packages/dashmate/docs/config/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ The rate limiter protects the Platform from excessive requests:
| `platform.gateway.rateLimiter.unit` | Time unit for rate limiting | `minute` | `hour` |
| `platform.gateway.rateLimiter.whitelist` | IPs exempt from rate limiting | `[]` | `["192.168.1.1"]` |
| `platform.gateway.rateLimiter.blacklist` | IPs blocked from all requests | `[]` | `["10.0.0.1"]` |
| `platform.gateway.rateLimiter.responseHeaders.enabled` | Emit `RateLimit-Limit`, `RateLimit-Remaining`, and `RateLimit-Reset` response headers. `rs-dapi-client` reads the Reset header to apply a precise ban window instead of the exponential health-ban ladder. Disable only for privacy reasons. | `true` | `false` |

Available time units:
- `second`: Per-second rate limiting
Expand Down
16 changes: 15 additions & 1 deletion packages/dashmate/src/config/configJsonSchema.js
Original file line number Diff line number Diff line change
Expand Up @@ -690,8 +690,22 @@ export default {
enabled: {
type: 'boolean',
},
responseHeaders: {
type: 'object',
description: 'Control emission of RateLimit-* response headers (RateLimit-Limit, '
+ 'RateLimit-Remaining, RateLimit-Reset). When enabled, rs-dapi-client reads '
+ 'the Reset header to ban the node for the server-advertised window instead '
+ 'of the exponential health-ban ladder. Disable only for privacy reasons.',
properties: {
enabled: {
type: 'boolean',
},
},
additionalProperties: false,
required: ['enabled'],
},
},
required: ['docker', 'enabled', 'unit', 'requestsPerUnit', 'blacklist', 'whitelist', 'metrics'],
required: ['docker', 'enabled', 'unit', 'requestsPerUnit', 'blacklist', 'whitelist', 'metrics', 'responseHeaders'],
Comment thread
Claudius-Maginificent marked this conversation as resolved.
additionalProperties: false,
},
ssl: {
Expand Down
250 changes: 244 additions & 6 deletions packages/rs-dapi-client/src/address_list.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,18 @@ impl AddressStatus {

/// Ban the [Address] and record the `reason` for the ban.
///
/// Applies the same exponential backoff as [`AddressStatus::ban`];
/// the only difference is that `ban_reason` is stored so callers
/// (diagnostics, the iOS explorer) can surface why a node was
/// banned.
/// Applies exponential backoff: the ban window is `base × e^ban_count`
/// (where `ban_count` is the value *before* this call), and `banned_until`
/// is always re-based to `now + window` unconditionally, regardless of any
/// existing active ban. Concretely, a health failure on a node that already
/// holds a longer rate-limit window (set via [`AddressStatus::ban_for`]) will
/// re-base `banned_until` to the exponential value, which may be shorter.
/// This is intentional: the exponential health-ban ladder owns the window for
/// genuinely-unhealthy nodes; the no-shorten guarantee is deliberately scoped
/// to `ban_for → ban_for` sequences only.
///
/// `ban_count` is incremented and `ban_reason` is updated unconditionally.
/// The counter resets to 0 on [`AddressStatus::unban`].
pub fn ban_with_reason(&mut self, base_ban_period: &Duration, reason: Option<String>) {
let coefficient = (self.ban_count as f64).exp();
let ban_period = Duration::from_secs_f64(base_ban_period.as_secs_f64() * coefficient);
Expand All @@ -103,6 +111,38 @@ impl AddressStatus {
self.ban_reason = reason;
}

/// Ban the address for an exact `period` (server-advertised), bypassing the
/// exponential ladder used by [`AddressStatus::ban_with_reason`].
///
/// The ban window is flat (not exponential). `banned_until` is advanced to
/// `now + period` only when that timestamp is **later** than the current
/// `banned_until`, so a short-reset call never shortens a longer active ban
/// (health ban or a prior longer rate-limit ban). `ban_reason` is updated
/// only when the window is extended. `ban_count` is raised to
/// `max(ban_count, 1)` unconditionally so that `is_banned()` and
/// `ban_info()` correctly report the node as banned. Side-effect: a
/// previously-clean node (ban_count 0) enters the ladder at floor 1,
/// meaning its *next* genuine health failure via
/// [`AddressStatus::ban_with_reason`] uses `60 s × e¹ ≈ 163 s` rather
/// than the first-rung `60 s × e⁰ = 60 s`. The counter resets to 0 on
/// [`AddressStatus::unban`].
///
/// Note: the no-shorten guard applies only to `ban_for → ban_for` call
/// sequences. [`AddressStatus::ban_with_reason`] re-bases `banned_until`
/// unconditionally — see its docs for the intentional cross-method semantics.
pub fn ban_for(&mut self, period: Duration, reason: Option<String>) {
let advertised_until = chrono::Utc::now() + period;
if self
.banned_until
.map(|current| current < advertised_until)
.unwrap_or(true)
{
self.banned_until = Some(advertised_until);
self.ban_reason = reason;
}
self.ban_count = self.ban_count.max(1);
}

/// Check if [Address] is banned.
pub fn is_banned(&self) -> bool {
self.ban_count > 0
Expand Down Expand Up @@ -182,6 +222,23 @@ impl AddressList {
true
}

/// Ban the address for an exact `period` (server-advertised); delegates to
/// [`AddressStatus::ban_for`] — see that method for the full contract
/// including the `ban_count` floor and ladder side-effect.
///
/// Returns `false` if the address is not in the list.
pub fn ban_for(&self, address: &Address, period: Duration, reason: Option<String>) -> bool {
let mut guard = self.addresses.write().unwrap();

let Some(status) = guard.get_mut(address) else {
return false;
};

status.ban_for(period, reason);

true
}

/// Clears address' ban record
/// Returns false if the address is not in the list.
pub fn unban(&self, address: &Address) -> bool {
Expand Down Expand Up @@ -237,12 +294,16 @@ impl AddressList {
self.add(Address::try_from(uri).expect("valid uri"))
}

/// Randomly select a not banned address.
/// Randomly select a not-banned address.
///
/// An address is considered live when it has never been banned or when its
/// ban period has already expired.
pub fn get_live_address(&self) -> Option<Address> {
// TODO(low): module-wide `.read()/.write().unwrap()` panics on a
// poisoned lock; adopt poison-tolerant locking consistently (SEC-003).
let guard = self.addresses.read().unwrap();

let mut rng = SmallRng::from_entropy();

let now = chrono::Utc::now();

guard
Expand Down Expand Up @@ -755,4 +816,181 @@ mod tests {
let list = AddressList::new();
assert!(list.ban_info().is_empty());
}

#[test]
fn test_address_status_ban_for_sets_exact_window_and_min_ban_count() {
let mut status = AddressStatus::default();
assert_eq!(status.ban_count, 0);
assert!(status.banned_until.is_none());

let before = chrono::Utc::now();
status.ban_for(Duration::from_secs(45), Some("rate limited".into()));
let after = chrono::Utc::now();

// ban_count must be at least 1 so is_banned() / ban_info().banned are consistent.
assert_eq!(status.ban_count, 1, "ban_for sets ban_count to max(0,1)=1");

// banned_until should be roughly now + 45 s.
let until = status.banned_until.expect("banned_until must be set");
let lower = (until - before).num_milliseconds() as f64 / 1000.0;
let upper = (until - after).num_milliseconds() as f64 / 1000.0;
assert!(
lower >= 44.9,
"banned_until lower bound too short: {lower}s"
);
assert!(upper <= 45.1, "banned_until upper bound too long: {upper}s");
assert_eq!(status.ban_reason.as_deref(), Some("rate limited"));
}

/// `ban_for` on a fresh node (ban_count = 0) raises ban_count to 1 (the
/// ladder floor). That means the *next* genuine health ban will escalate
/// from position 1 (~163 s) instead of position 0 (~60 s). This pins the
/// documented side-effect so regressions are caught.
#[test]
fn test_ban_for_raises_fresh_node_to_ladder_floor() {
let mut status = AddressStatus::default();
assert_eq!(status.ban_count, 0, "starts clean");

// Rate-limit ban on a never-before-banned node.
status.ban_for(Duration::from_secs(10), Some("rl".into()));
assert_eq!(
status.ban_count, 1,
"ban_for must raise ban_count 0 → 1 (ladder floor)"
);

// Subsequent genuine health failure must escalate from the floor (1),
// yielding ~60 s × e^1 ≈ 163 s, NOT the first-rung ~60 s × e^0 = 60 s.
let base = Duration::from_secs(60);
let before = chrono::Utc::now();
status.ban_with_reason(&base, None); // ban_count 1 → 2; window = 60s × e^1
let after = chrono::Utc::now();
assert_eq!(status.ban_count, 2);

let until = status.banned_until.expect("banned_until set");
let lo = (until - before).num_milliseconds() as f64 / 1000.0;
let hi = (until - after).num_milliseconds() as f64 / 1000.0;
let expected = 60.0_f64 * std::f64::consts::E; // ≈ 163 s
assert!(
lo >= expected - 0.5,
"window lower {lo:.1}s < expected {expected:.1}s (should escalate from floor 1)"
);
assert!(
hi <= expected + 0.5,
"window upper {hi:.1}s > expected {expected:.1}s"
);
}

#[test]
fn test_address_status_ban_for_does_not_inflate_existing_ban_count() {
// A node already health-banned (ban_count = 3) gets rate-limited.
// ban_count must stay at 3, not grow to 4.
let mut status = AddressStatus::default();
let base = Duration::from_secs(60);
status.ban_with_reason(&base, None); // → 1
status.ban_with_reason(&base, None); // → 2
status.ban_with_reason(&base, None); // → 3
status.ban_for(Duration::from_secs(30), Some("rl".into()));
assert_eq!(
status.ban_count, 3,
"ban_for must not inflate ban_count above its existing value"
);
}

#[test]
fn test_address_list_ban_for_returns_false_for_unknown() {
let list = AddressList::new();
let addr: Address = "http://127.0.0.1:3000".parse().unwrap();
assert!(!list.ban_for(&addr, Duration::from_secs(5), None));
}

#[test]
fn test_address_list_ban_for_bans_known_address() {
let mut list = AddressList::new();
let addr: Address = "http://127.0.0.1:3000".parse().unwrap();
list.add(addr.clone());

assert!(list.ban_for(&addr, Duration::from_secs(60), Some("rl".into())));
// The address must now be hidden from get_live_address.
assert!(list.get_live_address().is_none());
// ban_count is 1 (ban_for sets max(0,1)).
let info = list.ban_info();
assert_eq!(info.len(), 1);
assert!(info[0].banned);
assert_eq!(info[0].ban_count, 1);
}

/// After `ban_for`'s window expires the address re-enters rotation via
/// `get_live_address`. We verify both directions: the node is hidden during
/// an active window, and becomes live once the window has passed.
///
/// Window-expiry reinstatement is orthogonal to `unban()`: `get_live_address`
/// reinstates a node purely on `banned_until < now` regardless of `ban_count`,
/// so after expiry the node is live again while `is_banned()` (ban_count > 0)
/// is still true. This is a different path from `unban()`, which also zeroes
/// `ban_count`.
#[test]
fn test_ban_for_address_re_enters_rotation_after_window_expires() {
let mut list = AddressList::new();
let addr: Address = "http://127.0.0.1:3000".parse().unwrap();
list.add(addr.clone());

// Active 300-second window → node hidden.
assert!(list.ban_for(&addr, Duration::from_secs(300), Some("rl".into())));
assert!(
list.get_live_address().is_none(),
"node must be hidden during active ban window"
);

// Simulate window expiry by back-dating banned_until — do NOT touch ban_count.
{
let mut guard = list.addresses.write().unwrap();
let status = guard.get_mut(&addr).expect("addr must be in list");
status.banned_until = Some(chrono::Utc::now() - Duration::from_secs(1));
}

// After window expiry the node must re-enter rotation …
assert!(
list.get_live_address().is_some(),
"address must re-enter rotation after ban window expires"
);
// … but ban_count is still > 0, so is_banned() remains true.
// This distinguishes window-expiry from an explicit unban().
assert!(
list.is_banned(&addr),
"is_banned() must still be true after window expiry (ban_count not reset)"
);
}

/// Invariant 1 at the ladder source: the exponential ban window is
/// `base × e^ban_count`, `ban_count` incrementing on each ban. This pins the
/// exact formula independently of the `update_address_ban_status` entrypoint.
#[test]
fn test_ban_ladder_windows_match_exponential_formula() {
let mut status = AddressStatus::default();
let base_secs = 60.0_f64;
let base = Duration::from_secs(60);

for n in 0..3usize {
// coefficient uses ban_count BEFORE this ban (== n here).
let before = chrono::Utc::now();
status.ban(&base);
let after = chrono::Utc::now();

assert_eq!(status.ban_count, n + 1, "ban_count must increment");
let period = base_secs * (n as f64).exp();
let banned_until = status.banned_until.expect("banned_until is set");
let lower = (banned_until - before).num_milliseconds() as f64 / 1000.0;
let upper = (banned_until - after).num_milliseconds() as f64 / 1000.0;
assert!(
lower >= period - 0.05,
"ban #{} window lower bound {lower}s < expected {period}s",
n + 1
);
assert!(
upper <= period + 0.05,
"ban #{} window upper bound {upper}s > expected {period}s",
n + 1
);
}
}
}
Loading
Loading