Skip to content

feat(data-retention): granular PII redaction stages (input + block outputs)#5272

Open
TheodoreSpeaks wants to merge 6 commits into
stagingfrom
feat/pii-granular-redaction
Open

feat(data-retention): granular PII redaction stages (input + block outputs)#5272
TheodoreSpeaks wants to merge 6 commits into
stagingfrom
feat/pii-granular-redaction

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Add two execution-altering PII redaction stages alongside the existing log redaction: redact the workflow input before execution, and mask every block output in-flight before the next block reads it
  • Per-stage policy (entity types + language) for each of Logs / Workflow input / Block outputs; resolved most-specific-wins per workspace, with full back-compat for existing logs-only rules
  • In-flight stages fail-fast (abort the run) on a Presidio error instead of scrubbing or leaking; the logs stage keeps scrub-to-marker
  • Reuse the shared HTTP → Presidio path; block-output redaction runs before payload compaction so offloaded large values are still masked
  • Settings UI: chip-tabs across the three stages, language-first picker with the entity grid filtered to that language's recognizers, and a confirmation before removing a workspace override

Type of Change

  • New feature

Testing

Tested manually. Unit tests for resolver back-compat, redactObjectStrings + failure modes, and the contract schema. bun run lint, check:api-validation:strict, and check:migrations origin/staging all pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 30, 2026 1:08am

Request Review

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes execution-time data (inputs, block outputs, streams, memory) and log persistence with fail-fast vs scrub semantics; misconfiguration or Presidio outages can abort runs or alter workflow results.

Overview
Adds three independently configurable PII redaction stages (Logs, Workflow input, Block outputs), each with its own entity types and language, while legacy flat rules still map to logs-only.

Runtime: Workflow input is masked before execution; block outputs are masked in-flight (including before compaction and on restored snapshot state), with onFailure: 'throw' so runs abort instead of leaking or scrubbing computed data. Logs keep scrub-on-failure. Streaming blocks buffer without forwarding raw chunks when block-output redaction is on; agent memory writes are masked too; child workflows inherit the policy.

Presidio / masking: New /analyze_batch and /anonymize_batch endpoints; the app batches via shared byte/count budgets instead of per-string calls.

Policy & API: Schema and resolver return per-stage effective policy; log persist applies .logs only and drops the feature-flag gate so stored rules always drive masking (fail-safe).

UI: Data retention settings use stage tabs, language-filtered entity grids, and a confirm modal when removing workspace overrides.

Reviewed by Cursor Bugbot for commit eb6b25a. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/executor/execution/block-executor.ts
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds granular PII redaction controls across workflow execution and logging. The main changes are:

  • Per-stage policies for workflow input, block outputs, and logs.
  • Stored-rule resolution for execution-time redaction.
  • Batched Presidio masking for object strings and log payloads.
  • Settings UI updates for stage-specific language and entity selection.
  • Backward-compatible handling for legacy logs-only rules.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/execution-core.ts Resolves stored PII rules before execution and applies input and restored block-state masking.
apps/sim/lib/logs/execution/logger.ts Applies log redaction from stored rules without relying on the feature flag at persist time.
apps/sim/executor/execution/block-executor.ts Masks block outputs before compaction and buffers streaming output when block-output redaction is enabled.
apps/sim/executor/handlers/agent/memory.ts Masks agent memory content before persistence when block-output redaction is enabled.
apps/sim/lib/billing/retention.ts Resolves legacy and per-stage PII rules into effective input, block-output, and log policies.

Reviews (5): Last reviewed commit: "fix(data-retention): mask agent/Pi memor..." | Re-trigger Greptile

Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
Comment thread apps/sim/executor/execution/block-executor.ts
Comment thread apps/sim/lib/workflows/executor/execution-core.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bb3a84b. Configure here.

Comment thread apps/sim/executor/execution/block-executor.ts Outdated
Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant