Skip to content

fix(sdk/policy_cache): remove no-op background refresh (lost writes + NaN)#433

Open
dsmfa10 wants to merge 1 commit into
mainfrom
fix/sdk-policy-cache-refresh-noop
Open

fix(sdk/policy_cache): remove no-op background refresh (lost writes + NaN)#433
dsmfa10 wants to merge 1 commit into
mainfrom
fix/sdk-policy-cache-refresh-noop

Conversation

@dsmfa10
Copy link
Copy Markdown
Collaborator

@dsmfa10 dsmfa10 commented Jun 3, 2026

Problem

TokenPolicyCache::get_policy attempted a proactive "background refresh" when a
cached entry neared expiry. It spawned a task over Arc::new(self.clone()) and
wrote the refreshed policy into that clone:

// (before)
if self.config.auto_refresh {
    let ttl_remaining = entry.expires_at.saturating_sub(now);
    let total_ttl = entry.expires_at.saturating_sub(entry.cached_at);
    let ttl_percentage = ttl_remaining as f32 / total_ttl as f32;     // NaN when total_ttl == 0
    if ttl_percentage <= self.config.refresh_threshold {
        let self_clone = Arc::new(self.clone());
        tokio::spawn(async move {
            if let Ok(fresh) = self_clone.fetch_policy_from_network(&policy_id).await {
                let _ = self_clone.update_cached_policy(&policy_id, fresh);  // writes to throwaway clone
                self_clone.stats.write().refreshes += 1;                    // also discarded
            }
        });
    }
}

TokenPolicyCache::Clone deliberately yields a fresh, empty cache (there is
an explicit test, cache_clone_has_fresh_cache, asserting this). So the spawned
task refreshes into a clone whose cache is then dropped — the refreshed policy
never reaches the shared cache. Net effects:

  • A pure no-op for the cache (refresh result is lost), yet it still performs a
    network fetch and a task spawn each time.
  • A NaN ratio when default_ttl == 0 (x / 0.0); NaN <= threshold is false,
    so it silently never fires anyway.

Fix

Remove the dead refresh block. Policies refresh lazily on expiry via the
existing miss path (which already fetches and inserts into the real cache). This
also removes the wasted fetch/spawn and the NaN computation. (Note: now / the
TTL fields here are deterministic dt::tick() counters, not wall-clock, so this
stays within DSM's clockless rule — no protocol timestamps are involved.)

A correct proactive refresh would require sharing the cache behind an Arc
without breaking the documented "fresh cache on clone" invariant — that's a
separate change, noted in the deferred dead-code/refactor backlog.

Verification

cargo check -p dsm_sdk clean (no unused Arc/method warnings — those symbols
are used elsewhere). The cache_clone_has_fresh_cache invariant is preserved.

CI gates & coverage

Full verification for this PR runs in CI — Rust (cargo fmt --check,
clippy -D warnings, workspace tests), Frontend, Android Unit Tests,
Coverage, SPDX headers, CodeQL (see the PR's Checks tab). The local
check noted above is a subset; the broader mandated gates (full workspace test
suite, codegen/scan, Android, Frontend) run in CI, not locally — none is
silently skipped.

… NaN)

The near-expiry "background refresh" spawned a task over
`Arc::new(self.clone())`, but `TokenPolicyCache::Clone` deliberately
produces a fresh, empty cache (asserted by `cache_clone_has_fresh_cache`).
The refreshed policy was therefore written into a throwaway clone and never
reached the shared cache — a pure no-op that still performed a network
fetch, and computed `ttl_remaining / total_ttl` as NaN when
`default_ttl == 0` (which silently never triggered anyway).

Remove the dead refresh block; policies refresh lazily on expiry via the
miss path. A correct proactive refresh would require sharing the cache
behind an `Arc` without breaking clone semantics — a separate change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants