Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions openspec/changes/fix-skill-spec-compliance/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.openspec
66 changes: 66 additions & 0 deletions openspec/changes/fix-skill-spec-compliance/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## Context

Current skill discovery (`src/registry/discoverer.js:23-24`) looks for `skill.yaml`/`skill.json` files — a custom format not recognized by any Agent Skills compatible agent (Claude Code, VS Code, Cursor, Gemini CLI, OpenAI Codex, etc.). The standard format is `SKILL.md` with YAML frontmatter. This incompatibility blocks ecosystem skill interoperability. Additionally, activation is clunky: the model must call `skills_list`/`skill_view` LangChain tools rather than progressively loading SKILL.md via natural file-read. The sandbox runner uses `fork()` which only works for Node.js scripts, preventing execution of `.py`, `.sh`, `.rb` scripts bundled in skills.

## Goals / Non-Goals

**Goals:**
- Parse `SKILL.md` YAML frontmatter as the primary metadata source
- Enforce spec field constraints (name, description, compatibility)
- Implement progressive disclosure: catalog at startup, SKILL.md body on activation via model's file-read
- Replace `fork` with generic `spawn` to support non-Node scripts (.py, .sh, .rb, .ts)
- Add cross-client directory scanning (`.agents/skills/`, project-level, user-level)
- Drop `inputSchema`/`outputSchema` as they are not part of the standard
- Maintain backward compatibility with existing sandbox capabilities (path resolution, URL filtering, env injection, timeout handling)

**Non-Goals:**
- Adding script dependency management (e.g., `uv run`, `npx`) — the agent invokes scripts, not the harness
- Adding eval/integration test framework for skills
- Changing the registry class interface or existing unit test contracts
- Implementing a skills marketplace or remote registry
- Handling skill versioning/updates (agents handle this in their own ecosystems)

## Decisions

### Decision 1: SKILL.md over skill.yaml
**Choice**: Parse `SKILL.md` frontmatter exclusively (remove skill.yaml/skill.json support).
**Rationale**: The entire Agent Skills ecosystem uses `SKILL.md`. Keeping a parallel format adds maintenance burden with zero interoperability benefit.
**Alternatives considered**: Keep both formats and deprecate skill.yaml. Rejected: adds complexity, delays adoption, dual-format validation is error-prone.

### Decision 2: Model-driven activation over LangChain tools
**Choice**: Replace `skills_list`/`skill_view` tools with a system-prompt skill catalog. The model reads SKILL.md via its built-in file-read tool.
**Rationale**: Progressive disclosure is the spec's core principle. Tool-based activation forces two turns (list → view) and leaks the activation mechanism to the model.
**Alternatives considered**: Keep tools but add catalog as supplement. Rejected: tools are the wrong abstraction for skill activation — the agent knows its own file-read tools best.

### Decision 3: spawn over fork
**Choice**: Use `node:child_process.spawn` with interpreter detection by extension/shebang.
**Rationale**: `fork` only supports Node.js modules. The spec supports Python, Bash, Deno, Go, Ruby scripts bundled in skills.
**Alternatives considered**: Keep fork for .js and spawn for others. Rejected: adds branching complexity and memory management differences. Uniform spawn is simpler.

### Decision 4: Lenient validation on load
**Choice**: Warn on constraint violations (name too long, non-matching directory) but still load the skill. Skip only if YAML is completely unparseable or description is empty.
**Rationale**: The spec explicitly says "Lenient validation" for cross-client compatibility — skills authored for other clients may have minor violations.
**Alternatives considered**: Strict (skip any violation). Rejected: blocks interoperability with third-party skills.

### Decision 5: `.agents/skills/` as primary scan path
**Choice**: Add `.agents/skills/` to the default scan scopes alongside `skills/`.
**Rationale**: `.agents/skills/` is the cross-client interoperability convention. `skills/` is kept as project-local.
**Alternatives considered**: Replace `skills/` with `.agents/skills/`. Rejected: existing projects use `skills/`; breaking that would be confusing.

## Risks / Trade-offs

| Risk | Mitigation |
|------|-----------|
| Existing TUI skills panel may reference `skills_list`/`skill_view` tools | Update TUI to load catalog from registry directly — or keep a stripped-down `skills_list` tool that returns the catalog (non-LangChain wrapper) |
| `spawn` has different memory/resource semantics than `fork` | Add explicit `--max-old-space-size` for `.js` scripts; document memory limits in config |
| Progressive disclosure shifts activation logic into the system prompt | System prompt already handles skill discovery; this is a net simplification |
| Cross-client scanning may pick up unexpected skills from user profile | Trust gating: project-level skills require explicit trust flag; user-level always scanned |
| Removing `inputSchema`/`outputSchema` breaks backward compatibility | No skills in the repo currently define these schemas; safe removal with zero migration risk |

## Migration Plan

1. Implement all changes on this branch
2. Run full test suite — confirm 100% coverage
3. If any existing skills directory with `skill.yaml` files exists, convert to `SKILL.md` format
4. Update config.yaml `sandbox.paths` default to include `.agents/skills/`
5. Deploy and verify TUI skills panel updates
48 changes: 48 additions & 0 deletions openspec/changes/fix-skill-spec-compliance/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
## Why

The skill registry implementation uses a custom `skill.yaml`/`skill.json` metadata format that is incompatible with the Agent Skills specification defined at agentskills.io. This prevents discovery of skills from the broader ecosystem (Claude Code, VS Code, Cursor, Gemini CLI, etc.) and eliminates progressive disclosure — the agent must call explicit `skills_list`/`skill_view` tools instead of naturally loading skills based on user intent.

## What Changes

- **Rewrite skill discovery** to parse `SKILL.md` YAML frontmatter (the standard format) instead of `skill.yaml`/`skill.json`
- **Drop `inputSchema`/`outputSchema`** — these are not part of the Agent Skills spec and add unneeded complexity
- **Enforce spec frontmatter constraints**: `name` (1-64 chars, lowercase alphanumeric + hyphens, must match parent directory), `description` (1-1024 chars), `license`, `compatibility` (max 500 chars), `metadata` (string→string map), `allowed-tools` (space-separated pre-approved tool list)
- **Implement progressive disclosure**: catalog of `name`+`description` at startup, full `SKILL.md` body loaded by model's file-read tool only when activated
- **Replace model-driven skill tools** (`skills_list`, `skill_view` LangChain tools) with system-prompt skill catalog injection — the model decides relevance and reads SKILL.md via its built-in file-read capability
- **Replace `fork`** with generic `spawn` in sandbox runner to support `.py`, `.sh`, `.rb`, `.ts` and other non-Node.js scripts
- **Add cross-client directory scanning**: scan `.agents/skills/` alongside `skills/`, support project- and user-level scopes, handle name collisions (project overrides user), add trust gating for project-level skills
- **Add lenient YAML parsing fallback** for malformed YAML with unquoted colons
- **Remove `memoryLimit` hardcoding** — respect `ExecutionContextSchema` value for Node.js scripts
- **Update `openspec/specs/skills-registry/spec.md`** to reference `SKILL.md` format instead of `skill.yaml`/`skill.json`

## Capabilities

### Modified Capabilities
- `skills-registry`: Major change — switches from `skill.yaml` format to `SKILL.md` frontmatter, replaces tool-based activation with progressive disclosure, replaces `fork` with `spawn`, adds cross-client scanning

### New Capabilities
- `skill-discovery`: (replaces the discovery portion of skills-registry; this is the actual spec we modify at `specs/skills-registry/spec.md`)

### Capabilities with Delta Spec Files
- `skills-registry` — the existing spec file at `openspec/specs/skills-registry/spec.md` is modified with a delta spec

## Impact

**Modified files:**
- `src/registry/discoverer.js` — complete rewrite of frontmatter parsing
- `src/registry/validator.js` — new constraint enforcement + lenient YAML fallback
- `src/registry/types.js` — remove `inputSchema`/`outputSchema`, add standard fields
- `src/registry/registry.js` — minimal changes (new discoverer output shape)
- `src/sandbox/runner.js` — replace `fork` with `spawn`
- `src/tools/skills.js` — drop schema references from skill view output
- `src/tools/index.js` — remove `skills_list`/`skill_view` from tool permissions map
- `openspec/specs/skills-registry/spec.md` — update to SKILL.md format

**Removed capabilities:**
- `skills_list` and `skill_view` LangChain tools (replaced by progressive disclosure)

**Affected subsystems:**
- Agent activation flow (system prompt catalog → model-driven file read)
- Scheduled skill execution (sandbox runner must handle non-Node scripts)
- TUI skills panel (may still use skill_view or display catalog)
- config/sandbox.paths default scope (expands to include `.agents/skills/`)
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
## REMOVED Requirements

### Requirement: Skill Discovery
The system SHALL automatically discover all skills in the `skills/` directory by scanning for subdirectories and loading their metadata from a `skill.yaml` or `skill.json` file within each skill root.
**Reason**: Replaced by SKILL.md frontmatter parsing per Agent Specs specification.
**Migration**: Create a `SKILL.md` file in each skill directory with YAML frontmatter containing `name` and `description`.

#### Scenario: System discovers a new skill directory (OLD)
- **WHEN** a new directory is added to `skills/` containing a valid `skill.yaml` file
- **THEN** the registry registers the skill and makes it available for invocation on the next discovery cycle

#### Scenario: System ignores invalid skill directories (OLD)
- **WHEN** a directory in `skills/` lacks a `skill.yaml` or `skill.json` file
- **THEN** the system skips the directory and logs a warning at the debug level

### Requirement: Schema Validation
The system SHALL validate each skill's input schema against a zod-like schema definition before activation and reject skills with invalid or missing schemas.
**Reason**: `inputSchema`/`outputSchema` are not part of the Agent Skills specification.
**Migration**: No migration needed — these fields are removed from the skill metadata.

#### Scenario: Skill with valid schema is activated (OLD)
- **WHEN** a skill defines a valid `inputSchema` in its metadata
- **THEN** the system validates the schema at registry load time and activates the skill

#### Scenario: Skill with invalid schema is rejected (OLD)
- **WHEN** a skill's `inputSchema` fails validation
- **THEN** the system skips activation of the skill and logs the validation error

### Requirement: Input/Output Contracts
Every registered skill SHALL expose a defined input schema, a defined output schema, and a clear execution context specifying resource access boundaries.
**Reason**: `inputSchema`/`outputSchema` are not part of the Agent Skills specification and add unneeded complexity.
**Migration**: Skill inputs are now passed through the skill instructions directly. Output is structured data via the script's stdout.

#### Scenario: Tool invocation passes validated input to skill (OLD)
- **WHEN** the harness invokes a skill
- **THEN** the input is validated against the skill's input schema before execution

#### Scenario: Skill output conforms to declared output schema (OLD)
- **WHEN** a skill completes execution
- **THEN** the output is validated against the skill's output schema and an error is raised if it does not match

## ADDED Requirements

### Requirement: SKILL.md Frontmatter Discovery
The system SHALL automatically discover all skills by scanning for subdirectories containing a `SKILL.md` file. Metadata is extracted from the YAML frontmatter between `---` delimiters at the top of the file.
Required frontmatter fields: `name` (mandatory) and `description` (mandatory).
Optional fields: `license`, `compatibility` (max 500 chars), `metadata` (string-to-string map), `allowed-tools`, `disabled`.

#### Scenario: System discovers a SKILL.md with valid frontmatter
- **WHEN** a subdirectory of the skills scan path contains a `SKILL.md` with valid YAML frontmatter containing name and description
- **THEN** the registry registers the skill with the extracted metadata and makes it available for activation

#### Scenario: System discovers a SKILL.md without description
- **WHEN** a `SKILL.md` exists but its frontmatter lacks a non-empty description field
- **THEN** the system skips the skill and logs a warning

#### Scenario: System discovers a SKILL.md without YAML frontmatter
- **WHEN** a `SKILL.md` exists but has no YAML frontmatter between `---` delimiters
- **THEN** the system skips the skill and logs a warning

### Requirement: Name Validation
The system SHALL validate skill names against Agent Skills specification constraints: 1-64 characters, only lowercase ASCII alphanumeric characters and hyphens, no consecutive hyphens, must not start or end with a hyphen, must match the parent directory name.

#### Scenario: Valid name is accepted
- **WHEN** a skill name contains only lowercase alphanumeric characters and hyphens, is 1-64 characters, does not start or end with a hyphen, has no consecutive hyphens, and matches its parent directory name
- **THEN** the system accepts the name without warnings

#### Scenario: Name exceeds 64 characters
- **WHEN** a skill name exceeds 64 characters
- **THEN** the system logs a warning but still loads the skill (lenient validation)

#### Scenario: Name contains uppercase characters
- **WHEN** a skill name contains uppercase letters
- **THEN** the system logs a warning but still loads the skill (lenient validation)

#### Scenario: Name does not match parent directory
- **WHEN** the frontmatter `name` field does not match the parent directory name
- **THEN** the system logs a warning but still loads the skill (lenient validation)

#### Scenario: Name starts with hyphen
- **WHEN** a skill name starts with a hyphen
- **THEN** the system logs a warning but still loads the skill (lenient validation)

### Requirement: Description Validation
The system SHALL validate that descriptions are between 1 and 1024 characters and are non-empty. Empty or missing descriptions cause the skill to be skipped entirely.

#### Scenario: Valid description is accepted
- **WHEN** a description is present, non-empty, and 1-1024 characters
- **THEN** the system accepts the description without warnings

#### Scenario: Empty description is rejected
- **WHEN** a skill frontmatter has an empty or missing description field
- **THEN** the system skips the skill and logs an error

### Requirement: Progressive Disclosure
The system SHALL implement progressive disclosure for skill activation in three tiers: (1) load only `name` and `description` at startup into a catalog, (2) inject the catalog into the system prompt so the model can decide relevance, (3) the model loads the full `SKILL.md` body via its file-read tool when a task matches a skill's description.

#### Scenario: Catalog loaded at startup
- **WHEN** the system starts and discovers skills
- **THEN** only `name`, `description`, and `location` are loaded into the catalog — the full `SKILL.md` body is not loaded

#### Scenario: Model reads SKILL.md on activation
- **WHEN** the model determines a task matches a skill's description from the catalog
- **THEN** the model uses its file-read tool to load the `SKILL.md` at the catalog's location

#### Scenario: Multiple skills loaded simultaneously
- **WHEN** a task requires multiple skills
- **THEN** the model loads only the relevant SKILL.md files — unused skills remain at catalog level

### Requirement: Cross-Client Directory Scanning
The system SHALL scan multiple directory scopes for skills: project-level `skills/`, project-level `.agents/skills/`, user-level `~/.agents/skills/`, and any configured paths. When the same skill name is found in multiple scopes, project-level skills override user-level skills.

#### Scenario: Skills discovered in project-level skills directory
- **WHEN** a skill exists in `<project>/skills/<name>/SKILL.md`
- **THEN** the system discovers and registers the skill

#### Scenario: Skills discovered in .agents/skills directory
- **WHEN** a skill exists in `<project>/.agents/skills/<name>/SKILL.md` or `~/.agents/skills/<name>/SKILL.md`
- **THEN** the system discovers and registers the skill

#### Scenario: Project-level skill overrides user-level
- **WHEN** a skill named `code-review` exists in both `<project>/skills/` and `~/.agents/skills/`
- **THEN** the project-level skill takes precedence and the user-level skill is shadowed with a logged warning

### Requirement: SKILL.md Script Execution
The system SHALL execute `SKILL.md` scripts using `child_process.spawn` (not `fork`) to support non-Node.js scripts. The interpreter is detected by file extension and shebang line. Scripts in `scripts/` directories are resolved relative to the skill's base directory.

#### Scenario: Node.js script execution
- **WHEN** a skill references a `.js` script in its `scripts/` directory
- **THEN** the system executes it with Node.js, passing `--max-old-space-size` based on the configuration memory limit

#### Scenario: Python script execution
- **WHEN** a skill references a `.py` script in its `scripts/` directory
- **THEN** the system executes it with `python3`

#### Scenario: Bash script execution
- **WHEN** a skill references a `.sh` script in its `scripts/` directory
- **THEN** the system executes it with `bash`

#### Scenario: Script resolved relative to skill directory
- **WHEN** a skill's SKILL.md references `scripts/extract.py`
- **THEN** the system resolves the path as `<skill_directory>/scripts/extract.py`

### Requirement: Lenient YAML Parsing
The system SHALL parse YAML frontmatter from `SKILL.md` with lenient error handling. If initial YAML parsing fails, the system retries by wrapping values containing unquoted colons in double quotes. Skills with completely unparseable YAML are skipped with a logged error.

#### Scenario: Normal YAML parses correctly
- **WHEN** a SKILL.md frontmatter contains valid YAML
- **THEN** the system parses the frontmatter successfully

#### Scenario: YAML with unquoted colons retries
- **WHEN** a SKILL.md frontmatter contains unquoted colons like `description: Use when: the user asks about PDFs`
- **THEN** the system retries parsing with quoted values and loads the skill

#### Scenario: Completely unparseable YAML is skipped
- **WHEN** a SKILL.md frontmatter cannot be parsed even after lenient retry
- **THEN** the system skips the skill and logs an error

## MODIFIED Requirements

### Requirement: Permission Scoping
Each registered skill SHALL declare the permission scopes it requires, and the harness SHALL enforce these scopes during execution. Permission scopes now map to sandbox resource rules resolved through the capability enforcement system. The `filesystem:exec` permission scope grants read and execute access (previously only execute).

#### Scenario: Skill declares required permissions
- **WHEN** a skill's `SKILL.md` frontmatter specifies a `permissions` array
- **THEN** the system grants those scopes to the sandbox during execution

#### Scenario: Skill runs without declared permissions
- **WHEN** a skill does not declare permissions in its frontmatter
- **THEN** the system grants only the default read-only filesystem and env:read scopes
Loading