Skill Being Reviewed
Skill name: agent-security
Skill path: skills/ai-security/agent-security/
False Positive Analysis
Benign architecture that can be over-reported as unsafe if tool provenance is not separated from tool capability:
agent_runtime:
remote_tools:
- name: jira.search_issues
source: mcp://internal-tools.company.example
transport: mTLS
publisher: platform-security
manifest_digest: sha256:8c5...
allowed_actions: ["search", "read"]
write_actions: []
Why this is a false positive risk:
The skill correctly asks reviewers to inspect tool registration breadth, least privilege, HITL gates, and auditability. But it does not distinguish a remotely discovered tool with a verified publisher, pinned manifest, signed transport, and read-only action set from a dynamically discovered tool whose schema can change without review. A review that only sees "remote MCP/tool server present" can overstate risk when the trust root and manifest integrity are actually controlled.
Coverage Gaps
Missed variant 1: Dynamic tool manifest drift after approval
{
"server": "mcp://tools.example",
"manifest_reviewed_at": "2026-06-01T12:00:00Z",
"reviewed_tools": ["repo.search", "repo.read_file"],
"runtime_tools": ["repo.search", "repo.read_file", "repo.delete_branch", "workflow.dispatch"],
"manifest_pin": null
}
Why it should be caught:
The skill reviews dynamic vs static tool sets, but does not require evidence that a runtime tool registry matches the reviewed registry. In MCP/plugin-based agents, capabilities can appear through server upgrades, feature flags, tenant config, marketplace installs, or remote schema changes. If the system approves an architecture based on a stale registry, a later destructive tool can bypass the original risk review without any code change in the agent application.
Missed variant 2: Remote tool server identity is trusted by URL, not by publisher or artifact integrity
tools:
- url: https://tools.partner.example/mcp
auth: bearer_from_env
allowed: true
certificate_pinning: false
publisher_attestation: missing
manifest_signature: missing
schema_digest: missing
Why it should be caught:
The current process checks credentials and inter-agent trust, but it does not ask reviewers to verify tool-server supply chain controls: server identity, manifest signing, schema digest pinning, publisher ownership, transport security, and update approval. A compromised tool server or DNS/hosting takeover can silently alter descriptions, parameter schemas, or available operations. That is different from prompt injection: the agent may follow a legitimate-looking tool contract that was swapped underneath it.
Missed variant 3: OAuth/consent scopes are granted at connection time and later exceed the agent task
User connects Google Workspace integration.
Consent screen grants: Gmail read/write, Drive full access, Calendar write.
Agent task: summarize one shared document.
Runtime policy: trusts the integration token and does not downscope per task.
Why it should be caught:
The skill covers credential scope generally, but OAuth and user-consented integrations need a separate evidence gate. Many agent tools operate with delegated user tokens, not service accounts. The review should record consented scopes, token audience, refresh-token lifetime, per-task downscoping, revocation path, and whether the agent can use one user's delegated token for another user's workflow. Without this, least-privilege review can miss excessive delegated authority even when cloud IAM is clean.
Edge Cases
- A tool marketplace or MCP server may expose only safe tools today but add write/delete operations after an automatic update; reviews need manifest pinning or update approval evidence.
- Tool descriptions are security-relevant because models use them for planning. A malicious or compromised remote tool can describe a destructive action as read-only unless the broker enforces policy from a separate signed capability model.
- Multiple tenants can share the same remote tool server while requiring different capability sets; one tenant's approved actions should not become another tenant's available tools.
- OAuth refresh tokens can outlive the session that justified access. Agent reviews should verify token revocation, re-consent, and task-bounded access, not only initial consent.
- A remote tool can return schema-valid output that contains instructions for the next agent step; provenance and taint labels should survive into downstream tool-policy decisions.
Remediation Quality
Comparison to Other Tools
| Tool / Framework |
Catches this? |
Notes |
| OWASP Agentic AI Threats |
Partial |
Covers excessive agency and tool misuse, but local review still needs concrete tool registry and provenance evidence. |
| OWASP LLM Top 10 |
Partial |
LLM tool/plugin risks are relevant, but it does not by itself prove remote manifest integrity or consent downscoping. |
| SLSA / artifact signing patterns |
Partial |
Useful model for provenance and attestation, but must be adapted to tool manifests, schemas, and MCP/plugin releases. |
| OAuth security review |
Partial |
Can validate consent and token scope, but must be tied to agent task boundaries and tool execution policy. |
| Runtime MCP/tool registry export |
Yes when available |
Best evidence for what tools the agent can actually invoke at runtime, especially after feature flags or server updates. |
Overall Assessment
Strengths:
- Strong architecture checklist for permissions, least privilege, HITL, blast radius, audit, rollback, and multi-agent trust.
- Good warnings about dynamic tool sets, runtime tool injection, shared state, and broad credentials.
- Useful output format that can be extended without changing the skill's core structure.
Needs improvement:
- Add remote tool/MCP/plugin provenance evidence, not just local tool registration review.
- Require runtime tool registry comparison against the reviewed registry.
- Treat tool descriptions and schemas as security-sensitive artifacts that need integrity controls.
- Add OAuth/delegated-token scope review separate from service-account IAM.
- Add Not Evaluable reason codes for missing manifest digest, publisher identity, registry export, update policy, or consent-scope evidence.
Priority recommendations:
- Add a "Remote Tool Provenance and Registry Drift" step before permission scoring: capture server identity, publisher, manifest digest, schema digest, signature/attestation, transport trust, update policy, and runtime registry export time.
- Expand the Agent Inventory or add a Tool Registry table with
Reviewed manifest digest, Runtime digest, Publisher, Transport, Update approval, OAuth scopes, Downscoped per task?, and Evidence confidence.
- Add findings guidance: unpinned remote destructive tools should be High/Critical; docs-only tool provenance should be Not Evaluable; broad delegated OAuth scopes without per-task downscoping should be High when write-capable.
- Add pitfalls for trusting remote tool URLs, assuming marketplace installs are static, treating tool descriptions as harmless text, and reviewing service-account IAM while missing delegated user-token scope.
References
Bounty Info
Skill Being Reviewed
Skill name:
agent-securitySkill path:
skills/ai-security/agent-security/False Positive Analysis
Benign architecture that can be over-reported as unsafe if tool provenance is not separated from tool capability:
Why this is a false positive risk:
The skill correctly asks reviewers to inspect tool registration breadth, least privilege, HITL gates, and auditability. But it does not distinguish a remotely discovered tool with a verified publisher, pinned manifest, signed transport, and read-only action set from a dynamically discovered tool whose schema can change without review. A review that only sees "remote MCP/tool server present" can overstate risk when the trust root and manifest integrity are actually controlled.
Coverage Gaps
Missed variant 1: Dynamic tool manifest drift after approval
{ "server": "mcp://tools.example", "manifest_reviewed_at": "2026-06-01T12:00:00Z", "reviewed_tools": ["repo.search", "repo.read_file"], "runtime_tools": ["repo.search", "repo.read_file", "repo.delete_branch", "workflow.dispatch"], "manifest_pin": null }Why it should be caught:
The skill reviews dynamic vs static tool sets, but does not require evidence that a runtime tool registry matches the reviewed registry. In MCP/plugin-based agents, capabilities can appear through server upgrades, feature flags, tenant config, marketplace installs, or remote schema changes. If the system approves an architecture based on a stale registry, a later destructive tool can bypass the original risk review without any code change in the agent application.
Missed variant 2: Remote tool server identity is trusted by URL, not by publisher or artifact integrity
Why it should be caught:
The current process checks credentials and inter-agent trust, but it does not ask reviewers to verify tool-server supply chain controls: server identity, manifest signing, schema digest pinning, publisher ownership, transport security, and update approval. A compromised tool server or DNS/hosting takeover can silently alter descriptions, parameter schemas, or available operations. That is different from prompt injection: the agent may follow a legitimate-looking tool contract that was swapped underneath it.
Missed variant 3: OAuth/consent scopes are granted at connection time and later exceed the agent task
Why it should be caught:
The skill covers credential scope generally, but OAuth and user-consented integrations need a separate evidence gate. Many agent tools operate with delegated user tokens, not service accounts. The review should record consented scopes, token audience, refresh-token lifetime, per-task downscoping, revocation path, and whether the agent can use one user's delegated token for another user's workflow. Without this, least-privilege review can miss excessive delegated authority even when cloud IAM is clean.
Edge Cases
Remediation Quality
Comparison to Other Tools
Overall Assessment
Strengths:
Needs improvement:
Priority recommendations:
Reviewed manifest digest,Runtime digest,Publisher,Transport,Update approval,OAuth scopes,Downscoped per task?, andEvidence confidence.References
Bounty Info