Skip to content

feat(pii): add redaction timing metrics across sidecar and persist path#5264

Merged
TheodoreSpeaks merged 1 commit into
stagingfrom
feat/pii-timing
Jun 29, 2026
Merged

feat(pii): add redaction timing metrics across sidecar and persist path#5264
TheodoreSpeaks merged 1 commit into
stagingfrom
feat/pii-timing

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Presidio sidecar (apps/pii/server.py) now logs per-request duration for /analyze and /anonymize — previously it only emitted uvicorn 200 OK access lines with no timing, so the redact path had no observable latency anywhere
  • mask-batch endpoint log line now carries durationMs (per-chunk wall time incl. the 8-way concurrent analyze+anonymize fan-out)
  • redactPIIFromExecution emits a per-execution PII redaction completed log: stringCount, totalBytes, durationMs, and a scrubbed flag distinguishing real masking from the fail-safe path

Type of Change

  • New feature (observability / metrics)

Testing

Tested manually: python3 -m py_compile apps/pii/server.py passes; bun run lint and bun run check:api-validation:strict both pass. No behavioral change to redaction — timing instrumentation only.

Queryable once deployed: filter the /ecs/sim-*/presidio log group for duration_ms (model time), and the app log group for PII redaction completed (per-execution total).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

- Log per-request duration in the Presidio sidecar (/analyze, /anonymize)
- Add durationMs to the mask-batch endpoint log line
- Emit per-execution PII redaction timing (stringCount, totalBytes, durationMs, scrubbed)
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 29, 2026 5:57pm

Request Review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Logging and timing instrumentation only; no changes to masking, scrubbing, or API contracts beyond richer log fields.

Overview
Adds observability only along the PII redaction stack so latency can be traced from execution persist through the app to the Presidio sidecar.

The Presidio service (apps/pii/server.py) now logs per-request timing on /analyze and /anonymize (language/chars/entities or spans plus duration_ms) via a sim.pii logger wired into the container log stream.

The internal mask-batch route records wall-clock durationMs on its existing success log (full batch including concurrent Presidio work). redactPIIFromExecution emits PII redaction completed with stringCount, totalBytes, durationMs, and a new scrubbed flag when the fail-safe path runs (byte ceiling or masking error) instead of real Presidio masking.

Redaction behavior and fail-safe semantics are unchanged.

Reviewed by Cursor Bugbot for commit 4b2f1f2. Bugbot is set up for automated code reviews on this repo. Configure here.

@TheodoreSpeaks TheodoreSpeaks merged commit 23ec96b into staging Jun 29, 2026
16 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the feat/pii-timing branch June 29, 2026 18:04
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds timing logs for PII redaction. The main changes are:

  • Sidecar /analyze and /anonymize duration logs.
  • mask-batch success logs with durationMs.
  • Persist-path redaction completion logs with byte counts, string counts, duration, and scrub status.

Confidence Score: 4/5

The changed flow looks mergeable after tightening the new diagnostics.

  • The runtime additions are compatible with the expected Python and Bun environments.
  • Several reachable no-op and failure paths skip the new timing events.
  • The issues affect observability rather than redaction output.

apps/sim/lib/logs/execution/pii-redaction.ts, apps/sim/app/api/guardrails/mask-batch/route.ts, apps/pii/server.py

Important Files Changed

Filename Overview
apps/pii/server.py Adds sidecar timing logs for successful analyze and anonymize requests, but failed engine calls skip the new timing output.
apps/sim/app/api/guardrails/mask-batch/route.ts Adds route-specific success duration logging for mask batches, while the failure log still lacks the same duration field.
apps/sim/lib/logs/execution/pii-redaction.ts Adds completion logging for redaction work, but payloads with no collected strings return before the new log is emitted.

Comments Outside Diff (2)

  1. apps/sim/app/api/guardrails/mask-batch/route.ts, line 40-43 (link)

    P2 Failure Durations Lose Endpoint Context

    When maskPIIBatch throws because the Presidio sidecar is unavailable, times out, or returns a bad response, this catch path logs error and count without the new durationMs. The generic route wrapper still has an outer duration, but the endpoint-specific PII batch masking failed event cannot be queried for failure latency.

  2. apps/pii/server.py, line 198-211 (link)

    P2 Anonymize Failures Skip Timing

    If anonymizer.anonymize(...) raises for malformed spans or a Presidio runtime error, the endpoint exits before logging duration_ms. Failed /anonymize requests are then missing from the new sidecar timing metric, so the metric only covers successful anonymization work.

Reviews (1): Last reviewed commit: "feat(pii): add redaction timing metrics ..." | Re-trigger Greptile

@@ -151,12 +152,14 @@ export async function redactPIIFromExecution(
if (collected.length === 0) return payload

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No-String Executions Skip Metrics

When an execution payload has no collected strings, this early return skips the new PII redaction completed log. Those valid no-op executions become indistinguishable from cases where the redaction path did not run or logging failed, so per-execution dashboards undercount the persist path.

Suggested change
if (collected.length === 0) return payload
if (collected.length === 0) {
logger.info('PII redaction completed', {
stringCount: 0,
totalBytes,
durationMs: Math.round(performance.now() - startedAt),
scrubbed: false,
})
return payload
}

Comment thread apps/pii/server.py
Comment on lines 172 to 176
results = analyzer.analyze(
text=req.text,
language=req.language,
entities=req.entities or None,
score_threshold=req.score_threshold,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Analyze Failures Skip Timing

If analyzer.analyze(...) raises for a request, FastAPI returns an error before the new duration_ms log is emitted. Failed or slow /analyze calls then disappear from the sidecar timing stream, which weakens the latency signal during Presidio failures.

@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds timing metrics for PII redaction paths. The main changes are:

  • Presidio sidecar duration logs for /analyze and /anonymize.
  • durationMs on successful mask-batch route logs.
  • Per-execution redaction completion logs with string count, byte count, duration, and scrub state.

Confidence Score: 4/5

The redaction behavior looks unchanged, but the new production metrics can be missing on the paths they are meant to illuminate.

  • Sidecar timing logs can be dropped by the default uvicorn logging setup.
  • Successful production redactions can suppress the new completion metric.
  • Mask-batch failures still lack the new duration field.

apps/pii/server.py, apps/sim/lib/logs/execution/pii-redaction.ts, apps/sim/app/api/guardrails/mask-batch/route.ts

Important Files Changed

Filename Overview
apps/pii/server.py Adds sidecar endpoint timing logs, but the new logger may not emit under the default uvicorn logging setup.
apps/sim/app/api/guardrails/mask-batch/route.ts Adds success-path batch duration logging, while timeout and error paths still omit the route-level duration.
apps/sim/lib/logs/execution/pii-redaction.ts Adds completion logging for redaction work, but successful production runs may suppress the new info-level metric.

Comments Outside Diff (1)

  1. apps/sim/app/api/guardrails/mask-batch/route.ts, line 40-43 (link)

    P2 Timeout Path Loses Duration

    When maskPIIBatch fails because Presidio hangs until its 45s timeout or returns a non-2xx response, this catch block logs only the error and count. The new durationMs field appears only on successful batches, so the failure path cannot distinguish an instant sidecar outage from a slow timeout in the route's own log.

Reviews (2): Last reviewed commit: "feat(pii): add redaction timing metrics ..." | Re-trigger Greptile

Comment thread apps/pii/server.py
anonymizer = AnonymizerEngine()

# Propagates to uvicorn's root handler, so timing lands in the container log stream.
logger = logging.getLogger("sim.pii")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Timing Logger Is Unhandled

Under the sidecar's default uvicorn server:app startup, uvicorn configures its own named loggers rather than a root handler for sim.pii. The new /analyze and /anonymize logger.info calls can be dropped silently, so the timing logs described by the PR never reach the container log stream.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +184 to +189
logger.info('PII redaction completed', {
stringCount: collected.length,
totalBytes,
durationMs: Math.round(performance.now() - startedAt),
scrubbed,
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Production Suppresses Completion Logs

The app logger's production default suppresses info messages unless the deployment lowers LOG_LEVEL. In that common production state, successful redactions never emit PII redaction completed, so the per-execution timing metric added here is not queryable after deployment.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant