AI Sensitive Data Scanner (Batch)#475
Open
LukasHirt wants to merge 29 commits into
Open
Conversation
…e `packages/web-app-ai-sensitive-data-scanner/` with `package.json`, `vite.config.ts`, `tsconfig.json`, `src/index.ts` stub, `l10n/translations.json`, and `l10n/.tx/config` Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Fix two cascading e2e failures caused by oCIS state pollution: 1. oc-modal-background blocks afterEach cleanup: dispatchModal creates a full-screen backdrop with pointer-events that intercepts every click, preventing deleteAllFromPersonal() from reaching the app-switcher button. Set pointer-events: none on the backdrop in ScanResultsModal.onMounted so the modal stays visible while clicks pass through to the nav. 2. Leftover test-document.txt from prior gate runs: when cleanup fails after test 3, the file lingers in oCIS, causing uploadFile() to hang on the "File already exists" conflict dialog in the next run (tests 1 and 2). Add a Playwright globalSetup that deletes the known test fixture files via WebDAV (/remote.php/dav/files/admin/) before the suite runs. Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
…`src/composables/useLlm.ts` (copied from `web-app-ai-doc-summary`) and `src/utils/file-support.ts` Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
…seScan.ts`: text/PDF file fetching, sequential LLM calls with structured-output + plain-text fallback, same-origin endpoint validation, and per-file result state Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
Signed-off-by: Lukas Hirt <info@hirt.cz>
…nt: complete `src/index.ts` to register the `ActionExtension` on `global.files.batch-actions` with `isVisible` guard and `dispatchModal` handler Signed-off-by: Lukas Hirt <info@hirt.cz>
…sultsModal.vue`: scanning progress, per-file findings tables (structured) and narrative fallback, unconfigured-LLM state, using ODS components Signed-off-by: Lukas Hirt <info@hirt.cz>
…nit/components/ScanResultModal.spec.ts` and add the E2E scaffold in `tests/e2e/` Signed-off-by: Lukas Hirt <info@hirt.cz>
….md if present) for the extension Signed-off-by: Lukas Hirt <info@hirt.cz>
… CI matrix, and oCIS apps config Signed-off-by: Lukas Hirt <info@hirt.cz>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Teams routinely share folders containing files with accidentally embedded PII,
credentials, or confidential text. Manual inspection before sharing is
impossible at scale.
Solution
Users select files and click "Scan for sensitive data" in the batch actions
bar. The extension fetches text from supported files (txt, md, pdf) and sends
each to the LLM; a report modal lists per-file findings with redacted excerpts.
With structured-output models, findings are categorized (PII / credentials /
confidential); with basic text models, a plain per-file narrative is returned.
Without a configured LLM, the action opens an informational modal about the
missing setup.
Extension points
global.files.batch-actionsWhy ship this now
Compliance and data-governance requirements are rising for on-prem oCIS
customers; this gives them an instant pre-share check without leaving the files UI.
What was built
web-app-ai-sensitive-data-scanneris an oCIS Web extension that registers a single batch action onglobal.files.batch-actions. When users select one or more files and trigger "Scan for sensitive data," the extension fetches the text content of each supported file (CSV, Markdown, PDF, plain text), sends it to the configured LLM endpoint sequentially, and displays per-file findings in a results modal. PDF content is extracted via pdfjs-dist's fake-worker pattern, capped at 12,000 characters, consistent with the approach used in other AI extensions in this repo.The entry point (
src/index.ts) registers the action viadefineWebApplication, delegating file-type gating tosrc/utils/file-support.ts(isSupportedFile, defaulting to csv, md, pdf, txt). Scanning logic lives insrc/composables/useScanner.ts: it builds aFileScanResultper resource with progressive state transitions (pending → scanning → done | error | skipped), validates the LLM endpoint origin againstwindow.location.originbefore attaching the Bearer token, and processes files one at a time withawaitrather thanPromise.allto avoid rate-limit collisions.ScanResultsModal.vuedrives both the unconfigured-LLM path (shows a setup prompt and suppresses the scan) and the live path, rendering structured findings with category icons (pii, credentials, confidential) or a plainpre-wrapnarrative block when the LLM returns non-JSON text.Two deliberate degradation tiers are supported: when the model returns valid JSON, findings are surfaced as categorized entries with redacted excerpts; when it returns prose, the raw response is stored as a
narrativefield and rendered verbatim. The same-origin check is a hard gate — cross-origin endpoints produce a per-file error without sending credentials. The batch action registers exclusively onglobal.files.batch-actions; dual-registration withglobal.files.context-actionswas explicitly rejected during planning.Unit tests cover all rendering states of
ScanResultsModal.vue(unconfigured, global in-progress, per-file pending/scanning/skipped/error, narrative fallback, structured findings, and re-scan button visibility). An E2E scaffold (acceptance.spec.ts,ScannerPage.ts,playwright.config.ts,global-setup.ts) is committed but the acceptance tests themselves are out of scope for this PR — they require a live oCIS instance with an LLM sidecar and are not exercised in CI.Gate
Effort: M · 🤖 Generated by extctl