diff --git a/openspec/changes/fix-skill-spec-compliance/.gitignore b/openspec/changes/fix-skill-spec-compliance/.gitignore new file mode 100644 index 0000000..bfec4e2 --- /dev/null +++ b/openspec/changes/fix-skill-spec-compliance/.gitignore @@ -0,0 +1 @@ +.openspec diff --git a/openspec/changes/fix-skill-spec-compliance/design.md b/openspec/changes/fix-skill-spec-compliance/design.md new file mode 100644 index 0000000..cb5be5e --- /dev/null +++ b/openspec/changes/fix-skill-spec-compliance/design.md @@ -0,0 +1,66 @@ +## Context + +Current skill discovery (`src/registry/discoverer.js:23-24`) looks for `skill.yaml`/`skill.json` files — a custom format not recognized by any Agent Skills compatible agent (Claude Code, VS Code, Cursor, Gemini CLI, OpenAI Codex, etc.). The standard format is `SKILL.md` with YAML frontmatter. This incompatibility blocks ecosystem skill interoperability. Additionally, activation is clunky: the model must call `skills_list`/`skill_view` LangChain tools rather than progressively loading SKILL.md via natural file-read. The sandbox runner uses `fork()` which only works for Node.js scripts, preventing execution of `.py`, `.sh`, `.rb` scripts bundled in skills. + +## Goals / Non-Goals + +**Goals:** +- Parse `SKILL.md` YAML frontmatter as the primary metadata source +- Enforce spec field constraints (name, description, compatibility) +- Implement progressive disclosure: catalog at startup, SKILL.md body on activation via model's file-read +- Replace `fork` with generic `spawn` to support non-Node scripts (.py, .sh, .rb, .ts) +- Add cross-client directory scanning (`.agents/skills/`, project-level, user-level) +- Drop `inputSchema`/`outputSchema` as they are not part of the standard +- Maintain backward compatibility with existing sandbox capabilities (path resolution, URL filtering, env injection, timeout handling) + +**Non-Goals:** +- Adding script dependency management (e.g., `uv run`, `npx`) — the agent invokes scripts, not the harness +- Adding eval/integration test framework for skills +- Changing the registry class interface or existing unit test contracts +- Implementing a skills marketplace or remote registry +- Handling skill versioning/updates (agents handle this in their own ecosystems) + +## Decisions + +### Decision 1: SKILL.md over skill.yaml +**Choice**: Parse `SKILL.md` frontmatter exclusively (remove skill.yaml/skill.json support). +**Rationale**: The entire Agent Skills ecosystem uses `SKILL.md`. Keeping a parallel format adds maintenance burden with zero interoperability benefit. +**Alternatives considered**: Keep both formats and deprecate skill.yaml. Rejected: adds complexity, delays adoption, dual-format validation is error-prone. + +### Decision 2: Model-driven activation over LangChain tools +**Choice**: Replace `skills_list`/`skill_view` tools with a system-prompt skill catalog. The model reads SKILL.md via its built-in file-read tool. +**Rationale**: Progressive disclosure is the spec's core principle. Tool-based activation forces two turns (list → view) and leaks the activation mechanism to the model. +**Alternatives considered**: Keep tools but add catalog as supplement. Rejected: tools are the wrong abstraction for skill activation — the agent knows its own file-read tools best. + +### Decision 3: spawn over fork +**Choice**: Use `node:child_process.spawn` with interpreter detection by extension/shebang. +**Rationale**: `fork` only supports Node.js modules. The spec supports Python, Bash, Deno, Go, Ruby scripts bundled in skills. +**Alternatives considered**: Keep fork for .js and spawn for others. Rejected: adds branching complexity and memory management differences. Uniform spawn is simpler. + +### Decision 4: Lenient validation on load +**Choice**: Warn on constraint violations (name too long, non-matching directory) but still load the skill. Skip only if YAML is completely unparseable or description is empty. +**Rationale**: The spec explicitly says "Lenient validation" for cross-client compatibility — skills authored for other clients may have minor violations. +**Alternatives considered**: Strict (skip any violation). Rejected: blocks interoperability with third-party skills. + +### Decision 5: `.agents/skills/` as primary scan path +**Choice**: Add `.agents/skills/` to the default scan scopes alongside `skills/`. +**Rationale**: `.agents/skills/` is the cross-client interoperability convention. `skills/` is kept as project-local. +**Alternatives considered**: Replace `skills/` with `.agents/skills/`. Rejected: existing projects use `skills/`; breaking that would be confusing. + +## Risks / Trade-offs + +| Risk | Mitigation | +|------|-----------| +| Existing TUI skills panel may reference `skills_list`/`skill_view` tools | Update TUI to load catalog from registry directly — or keep a stripped-down `skills_list` tool that returns the catalog (non-LangChain wrapper) | +| `spawn` has different memory/resource semantics than `fork` | Add explicit `--max-old-space-size` for `.js` scripts; document memory limits in config | +| Progressive disclosure shifts activation logic into the system prompt | System prompt already handles skill discovery; this is a net simplification | +| Cross-client scanning may pick up unexpected skills from user profile | Trust gating: project-level skills require explicit trust flag; user-level always scanned | +| Removing `inputSchema`/`outputSchema` breaks backward compatibility | No skills in the repo currently define these schemas; safe removal with zero migration risk | + +## Migration Plan + +1. Implement all changes on this branch +2. Run full test suite — confirm 100% coverage +3. If any existing skills directory with `skill.yaml` files exists, convert to `SKILL.md` format +4. Update config.yaml `sandbox.paths` default to include `.agents/skills/` +5. Deploy and verify TUI skills panel updates diff --git a/openspec/changes/fix-skill-spec-compliance/proposal.md b/openspec/changes/fix-skill-spec-compliance/proposal.md new file mode 100644 index 0000000..d32c9f4 --- /dev/null +++ b/openspec/changes/fix-skill-spec-compliance/proposal.md @@ -0,0 +1,48 @@ +## Why + +The skill registry implementation uses a custom `skill.yaml`/`skill.json` metadata format that is incompatible with the Agent Skills specification defined at agentskills.io. This prevents discovery of skills from the broader ecosystem (Claude Code, VS Code, Cursor, Gemini CLI, etc.) and eliminates progressive disclosure — the agent must call explicit `skills_list`/`skill_view` tools instead of naturally loading skills based on user intent. + +## What Changes + +- **Rewrite skill discovery** to parse `SKILL.md` YAML frontmatter (the standard format) instead of `skill.yaml`/`skill.json` +- **Drop `inputSchema`/`outputSchema`** — these are not part of the Agent Skills spec and add unneeded complexity +- **Enforce spec frontmatter constraints**: `name` (1-64 chars, lowercase alphanumeric + hyphens, must match parent directory), `description` (1-1024 chars), `license`, `compatibility` (max 500 chars), `metadata` (string→string map), `allowed-tools` (space-separated pre-approved tool list) +- **Implement progressive disclosure**: catalog of `name`+`description` at startup, full `SKILL.md` body loaded by model's file-read tool only when activated +- **Replace model-driven skill tools** (`skills_list`, `skill_view` LangChain tools) with system-prompt skill catalog injection — the model decides relevance and reads SKILL.md via its built-in file-read capability +- **Replace `fork`** with generic `spawn` in sandbox runner to support `.py`, `.sh`, `.rb`, `.ts` and other non-Node.js scripts +- **Add cross-client directory scanning**: scan `.agents/skills/` alongside `skills/`, support project- and user-level scopes, handle name collisions (project overrides user), add trust gating for project-level skills +- **Add lenient YAML parsing fallback** for malformed YAML with unquoted colons +- **Remove `memoryLimit` hardcoding** — respect `ExecutionContextSchema` value for Node.js scripts +- **Update `openspec/specs/skills-registry/spec.md`** to reference `SKILL.md` format instead of `skill.yaml`/`skill.json` + +## Capabilities + +### Modified Capabilities +- `skills-registry`: Major change — switches from `skill.yaml` format to `SKILL.md` frontmatter, replaces tool-based activation with progressive disclosure, replaces `fork` with `spawn`, adds cross-client scanning + +### New Capabilities +- `skill-discovery`: (replaces the discovery portion of skills-registry; this is the actual spec we modify at `specs/skills-registry/spec.md`) + +### Capabilities with Delta Spec Files +- `skills-registry` — the existing spec file at `openspec/specs/skills-registry/spec.md` is modified with a delta spec + +## Impact + +**Modified files:** +- `src/registry/discoverer.js` — complete rewrite of frontmatter parsing +- `src/registry/validator.js` — new constraint enforcement + lenient YAML fallback +- `src/registry/types.js` — remove `inputSchema`/`outputSchema`, add standard fields +- `src/registry/registry.js` — minimal changes (new discoverer output shape) +- `src/sandbox/runner.js` — replace `fork` with `spawn` +- `src/tools/skills.js` — drop schema references from skill view output +- `src/tools/index.js` — remove `skills_list`/`skill_view` from tool permissions map +- `openspec/specs/skills-registry/spec.md` — update to SKILL.md format + +**Removed capabilities:** +- `skills_list` and `skill_view` LangChain tools (replaced by progressive disclosure) + +**Affected subsystems:** +- Agent activation flow (system prompt catalog → model-driven file read) +- Scheduled skill execution (sandbox runner must handle non-Node scripts) +- TUI skills panel (may still use skill_view or display catalog) +- config/sandbox.paths default scope (expands to include `.agents/skills/`) diff --git a/openspec/changes/fix-skill-spec-compliance/specs/skills-registry/spec.md b/openspec/changes/fix-skill-spec-compliance/specs/skills-registry/spec.md new file mode 100644 index 0000000..489a766 --- /dev/null +++ b/openspec/changes/fix-skill-spec-compliance/specs/skills-registry/spec.md @@ -0,0 +1,170 @@ +## REMOVED Requirements + +### Requirement: Skill Discovery +The system SHALL automatically discover all skills in the `skills/` directory by scanning for subdirectories and loading their metadata from a `skill.yaml` or `skill.json` file within each skill root. +**Reason**: Replaced by SKILL.md frontmatter parsing per Agent Specs specification. +**Migration**: Create a `SKILL.md` file in each skill directory with YAML frontmatter containing `name` and `description`. + +#### Scenario: System discovers a new skill directory (OLD) +- **WHEN** a new directory is added to `skills/` containing a valid `skill.yaml` file +- **THEN** the registry registers the skill and makes it available for invocation on the next discovery cycle + +#### Scenario: System ignores invalid skill directories (OLD) +- **WHEN** a directory in `skills/` lacks a `skill.yaml` or `skill.json` file +- **THEN** the system skips the directory and logs a warning at the debug level + +### Requirement: Schema Validation +The system SHALL validate each skill's input schema against a zod-like schema definition before activation and reject skills with invalid or missing schemas. +**Reason**: `inputSchema`/`outputSchema` are not part of the Agent Skills specification. +**Migration**: No migration needed — these fields are removed from the skill metadata. + +#### Scenario: Skill with valid schema is activated (OLD) +- **WHEN** a skill defines a valid `inputSchema` in its metadata +- **THEN** the system validates the schema at registry load time and activates the skill + +#### Scenario: Skill with invalid schema is rejected (OLD) +- **WHEN** a skill's `inputSchema` fails validation +- **THEN** the system skips activation of the skill and logs the validation error + +### Requirement: Input/Output Contracts +Every registered skill SHALL expose a defined input schema, a defined output schema, and a clear execution context specifying resource access boundaries. +**Reason**: `inputSchema`/`outputSchema` are not part of the Agent Skills specification and add unneeded complexity. +**Migration**: Skill inputs are now passed through the skill instructions directly. Output is structured data via the script's stdout. + +#### Scenario: Tool invocation passes validated input to skill (OLD) +- **WHEN** the harness invokes a skill +- **THEN** the input is validated against the skill's input schema before execution + +#### Scenario: Skill output conforms to declared output schema (OLD) +- **WHEN** a skill completes execution +- **THEN** the output is validated against the skill's output schema and an error is raised if it does not match + +## ADDED Requirements + +### Requirement: SKILL.md Frontmatter Discovery +The system SHALL automatically discover all skills by scanning for subdirectories containing a `SKILL.md` file. Metadata is extracted from the YAML frontmatter between `---` delimiters at the top of the file. +Required frontmatter fields: `name` (mandatory) and `description` (mandatory). +Optional fields: `license`, `compatibility` (max 500 chars), `metadata` (string-to-string map), `allowed-tools`, `disabled`. + +#### Scenario: System discovers a SKILL.md with valid frontmatter +- **WHEN** a subdirectory of the skills scan path contains a `SKILL.md` with valid YAML frontmatter containing name and description +- **THEN** the registry registers the skill with the extracted metadata and makes it available for activation + +#### Scenario: System discovers a SKILL.md without description +- **WHEN** a `SKILL.md` exists but its frontmatter lacks a non-empty description field +- **THEN** the system skips the skill and logs a warning + +#### Scenario: System discovers a SKILL.md without YAML frontmatter +- **WHEN** a `SKILL.md` exists but has no YAML frontmatter between `---` delimiters +- **THEN** the system skips the skill and logs a warning + +### Requirement: Name Validation +The system SHALL validate skill names against Agent Skills specification constraints: 1-64 characters, only lowercase ASCII alphanumeric characters and hyphens, no consecutive hyphens, must not start or end with a hyphen, must match the parent directory name. + +#### Scenario: Valid name is accepted +- **WHEN** a skill name contains only lowercase alphanumeric characters and hyphens, is 1-64 characters, does not start or end with a hyphen, has no consecutive hyphens, and matches its parent directory name +- **THEN** the system accepts the name without warnings + +#### Scenario: Name exceeds 64 characters +- **WHEN** a skill name exceeds 64 characters +- **THEN** the system logs a warning but still loads the skill (lenient validation) + +#### Scenario: Name contains uppercase characters +- **WHEN** a skill name contains uppercase letters +- **THEN** the system logs a warning but still loads the skill (lenient validation) + +#### Scenario: Name does not match parent directory +- **WHEN** the frontmatter `name` field does not match the parent directory name +- **THEN** the system logs a warning but still loads the skill (lenient validation) + +#### Scenario: Name starts with hyphen +- **WHEN** a skill name starts with a hyphen +- **THEN** the system logs a warning but still loads the skill (lenient validation) + +### Requirement: Description Validation +The system SHALL validate that descriptions are between 1 and 1024 characters and are non-empty. Empty or missing descriptions cause the skill to be skipped entirely. + +#### Scenario: Valid description is accepted +- **WHEN** a description is present, non-empty, and 1-1024 characters +- **THEN** the system accepts the description without warnings + +#### Scenario: Empty description is rejected +- **WHEN** a skill frontmatter has an empty or missing description field +- **THEN** the system skips the skill and logs an error + +### Requirement: Progressive Disclosure +The system SHALL implement progressive disclosure for skill activation in three tiers: (1) load only `name` and `description` at startup into a catalog, (2) inject the catalog into the system prompt so the model can decide relevance, (3) the model loads the full `SKILL.md` body via its file-read tool when a task matches a skill's description. + +#### Scenario: Catalog loaded at startup +- **WHEN** the system starts and discovers skills +- **THEN** only `name`, `description`, and `location` are loaded into the catalog — the full `SKILL.md` body is not loaded + +#### Scenario: Model reads SKILL.md on activation +- **WHEN** the model determines a task matches a skill's description from the catalog +- **THEN** the model uses its file-read tool to load the `SKILL.md` at the catalog's location + +#### Scenario: Multiple skills loaded simultaneously +- **WHEN** a task requires multiple skills +- **THEN** the model loads only the relevant SKILL.md files — unused skills remain at catalog level + +### Requirement: Cross-Client Directory Scanning +The system SHALL scan multiple directory scopes for skills: project-level `skills/`, project-level `.agents/skills/`, user-level `~/.agents/skills/`, and any configured paths. When the same skill name is found in multiple scopes, project-level skills override user-level skills. + +#### Scenario: Skills discovered in project-level skills directory +- **WHEN** a skill exists in `/skills//SKILL.md` +- **THEN** the system discovers and registers the skill + +#### Scenario: Skills discovered in .agents/skills directory +- **WHEN** a skill exists in `/.agents/skills//SKILL.md` or `~/.agents/skills//SKILL.md` +- **THEN** the system discovers and registers the skill + +#### Scenario: Project-level skill overrides user-level +- **WHEN** a skill named `code-review` exists in both `/skills/` and `~/.agents/skills/` +- **THEN** the project-level skill takes precedence and the user-level skill is shadowed with a logged warning + +### Requirement: SKILL.md Script Execution +The system SHALL execute `SKILL.md` scripts using `child_process.spawn` (not `fork`) to support non-Node.js scripts. The interpreter is detected by file extension and shebang line. Scripts in `scripts/` directories are resolved relative to the skill's base directory. + +#### Scenario: Node.js script execution +- **WHEN** a skill references a `.js` script in its `scripts/` directory +- **THEN** the system executes it with Node.js, passing `--max-old-space-size` based on the configuration memory limit + +#### Scenario: Python script execution +- **WHEN** a skill references a `.py` script in its `scripts/` directory +- **THEN** the system executes it with `python3` + +#### Scenario: Bash script execution +- **WHEN** a skill references a `.sh` script in its `scripts/` directory +- **THEN** the system executes it with `bash` + +#### Scenario: Script resolved relative to skill directory +- **WHEN** a skill's SKILL.md references `scripts/extract.py` +- **THEN** the system resolves the path as `/scripts/extract.py` + +### Requirement: Lenient YAML Parsing +The system SHALL parse YAML frontmatter from `SKILL.md` with lenient error handling. If initial YAML parsing fails, the system retries by wrapping values containing unquoted colons in double quotes. Skills with completely unparseable YAML are skipped with a logged error. + +#### Scenario: Normal YAML parses correctly +- **WHEN** a SKILL.md frontmatter contains valid YAML +- **THEN** the system parses the frontmatter successfully + +#### Scenario: YAML with unquoted colons retries +- **WHEN** a SKILL.md frontmatter contains unquoted colons like `description: Use when: the user asks about PDFs` +- **THEN** the system retries parsing with quoted values and loads the skill + +#### Scenario: Completely unparseable YAML is skipped +- **WHEN** a SKILL.md frontmatter cannot be parsed even after lenient retry +- **THEN** the system skips the skill and logs an error + +## MODIFIED Requirements + +### Requirement: Permission Scoping +Each registered skill SHALL declare the permission scopes it requires, and the harness SHALL enforce these scopes during execution. Permission scopes now map to sandbox resource rules resolved through the capability enforcement system. The `filesystem:exec` permission scope grants read and execute access (previously only execute). + +#### Scenario: Skill declares required permissions +- **WHEN** a skill's `SKILL.md` frontmatter specifies a `permissions` array +- **THEN** the system grants those scopes to the sandbox during execution + +#### Scenario: Skill runs without declared permissions +- **WHEN** a skill does not declare permissions in its frontmatter +- **THEN** the system grants only the default read-only filesystem and env:read scopes diff --git a/openspec/changes/fix-skill-spec-compliance/tasks.md b/openspec/changes/fix-skill-spec-compliance/tasks.md new file mode 100644 index 0000000..2a968f6 --- /dev/null +++ b/openspec/changes/fix-skill-spec-compliance/tasks.md @@ -0,0 +1,87 @@ +## 1. Update types.js — drop inputSchema/outputSchema, add standard fields + +- [ ] 1.1 Remove SkillInputSchema and SkillMetadataSchema inputSchema/outputSchema fields +- [ ] 1.2 Add license, compatibility, metadata, allowed-tools, disabled fields to SkillMetadataSchema +- [ ] 1.3 Update PermissionSchema if permission values need adjustments +- [ ] 1.4 Export only new schema types from types.js (remove SkillInputSchema) + +## 2. Rewrite discoverer.js — parse SKILL.md frontmatter + +- [ ] 2.1 Add extractFrontmatter function that parses YAML between `---` delimiters +- [ ] 2.2 Add lenientYamlParse fallback that retries with quoted colons +- [ ] 2.3 Update discoverSkills to scan for SKILL.md files instead of skill.yaml/skill.json +- [ ] 2.4 Add cross-client directory scope constants: project skills/, .agents/skills/, user ~/.agents/skills/ +- [ ] 2.5 Update discoverSkills to scan multiple scopes (project-level + user-level) +- [ ] 2.6 Add name collision detection — project overrides user, log warning on shadow +- [ ] 2.7 Add trust check flag: only load project-level skills if trust config is set +- [ ] 2.8 Add .gitignore / dotfile filtering — skip node_modules, .git, hidden dirs + +## 3. Rewrite validator.js — enforce spec constraints with lenient validation + +- [ ] 3.1 Add validateSkillName function with spec constraints (chars, length, directory match) +- [ ] 3.2 Add validateSkillDescription function with 1-1024 char constraint +- [ ] 3.3 Add validateOptionalFields for compatibility (max 500 chars), metadata (string map) +- [ ] 3.4 Update validateSkillSchema to use new validators with lenient approach (warn, don't skip — except empty desc or unparseable YAML) +- [ ] 3.5 Remove inputSchema/outputSchema validation logic + +## 4. Update registry.js — wire to new discoverer output + +- [ ] 4.1 Update discover method to use new frontmatter-extracted metadata shape +- [ ] 4.2 Store full SKILL.md body path in skill record for progressive disclosure +- [ ] 4.3 Add getCatalog method that returns {name, description, location} for all skills (tier 1 of progressive disclosure) +- [ ] 4.4 Add getSkillBody method that reads full SKILL.md body at activation time (tier 2) + +## 5. Update tools/skills.js — remove schema references + +- [ ] 5.1 Remove inputSchema and outputSchema from skill_viewImpl output object +- [ ] 5.2 Add compatibility and metadata fields to skill_view output +- [ ] 5.3 Update tools/skills.js description to reflect new SKILL.md format +- [ ] 5.4 Keep skill_view tool but re-purpose it as a legacy access path (still valid for manual inspection) + +## 6. Update tools/index.js — remove skills_list from tool permissions + +- [ ] 6.1 Remove skills_list from TOOL_PERMISSIONS map +- [ ] 6.2 Remove skills_list from TOOL_FACTORIES map +- [ ] 6.3 Remove skill_view from TOOL_PERMISSIONS or keep it (decision: keep for manual TUI use) + +## 7. Replace fork with spawn in sandbox/runner.js + +- [ ] 7.1 Create detectInterpreter function that maps file extension to interpreter command +- [ ] 7.2 Create detectShebang function that reads first line of script +- [ ] 7.3 Update runSandbox to use child_process.spawn instead of fork +- [ ] 7.4 Pass --max-old-space-size to Node.js scripts based on memoryLimit config +- [ ] 7.5 For non-Node.js scripts, set appropriate resource limits or skip memory flag +- [ ] 7.6 Ensure stdout/stderr collection still works with spawn's Stream data event + +## 8. Update tools/skills.js progressive disclosure support + +- [ ] 8.1 Update skills_list tool to call registry.getCatalog() instead of registry.list() +- [ ] 8.2 Add systemPrompt section generation function that formats catalog for model injection +- [ ] 8.3 Export generateSkillCatalogPrompt utility for use in system prompt construction + +## 9. Update config defaults + +- [ ] 9.1 Add trustProjectSkills config default to sandbox config (boolean, default true) +- [ ] 9.2 Add scanPaths config default that includes `skills/` and `.agents/skills/` patterns +- [ ] 9.3 Update ConfigSchema if new config options are needed + +## 10. Update schema validation and tests + +- [ ] 10.1 Update openspec/specs/skills-registry/spec.md to use SKILL.md format +- [ ] 10.2 Add unit tests for extractFrontmatter in discoverer.js +- [ ] 10.3 Add unit tests for validateSkillName constraints +- [ ] 10.4 Add unit tests for validateSkillDescription constraints +- [ ] 10.5 Add unit tests for lenientYamlParse fallback +- [ ] 10.6 Add unit tests for detectInterpreter by extension and shebang +- [ ] 10.7 Add unit tests for cross-client dir collision handling +- [ ] 10.8 Verify all existing unit tests pass (registry.test.js, sandbox.test.js) +- [ ] 10.9 Run full test suite with coverage to ensure 100% + +## 11. Integration verification + +- [ ] 11.1 Create a test SKILL.md with full frontmatter and validate end-to-end discovery +- [ ] 11.2 Verify catalog generation outputs correct name/description/location +- [ ] 11.3 Verify that spawning a .py script from a skill directory works end-to-end +- [ ] 11.4 Verify system prompt includes skill catalog correctly +- [ ] 11.5 Run npm run lint and npm run fix to ensure no linting issues +- [ ] 11.6 Run npm run coverage to regenerate coverage.txt and verify 100%