Skip to content

Fix validate prompts pipeline#100

Merged
isolomatov-gd merged 7 commits into
mainfrom
feature/fix-validate-prompts-pipeline
May 29, 2026
Merged

Fix validate prompts pipeline#100
isolomatov-gd merged 7 commits into
mainfrom
feature/fix-validate-prompts-pipeline

Conversation

@isolomatov-gd
Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: isolomatov-gd <isolomatov@griddynamics.com>
Signed-off-by: isolomatov-gd <isolomatov@griddynamics.com>
@github-actions github-actions Bot added bug Something isn't working enhancement New feature or request labels May 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Rosetta Triage Review

Summary: This PR fixes the previously disabled validate-prompts CI pipeline (triggers were commented out) and substantially refactors the Prompt Quality Auditor agent (prompt-comparison.md) to support a new structured input contract, per-file JSON array output, deleted/new file handling, and subagent-based orchestration for large PRs. Instruction reference files (pa-rosetta.md, pa-hardening.md, pa-patterns.md) and two workflow files receive targeted improvements alongside a new plans/sdlc-flow-skill/prompt-brief.md planning artifact.

Findings:

  • Empty PR body: The pull request description is blank — no explanation of what changed, why the pipeline was broken, or what the new agent input/output contract enforces. This makes review harder and leaves no audit trail.
  • Breaking input contract change: prompt-comparison.md replaces the BASE/NEW file-path pair input with a semicolon-separated structured prompt (Changed files ; Git base ref ; Changed count ; Output file ; Diff file). The YAML is updated accordingly, but any external callers or documentation referencing the old contract will silently break.
  • Output schema change: Output changed from a single JSON object ({gates, issues}) to a JSON array of per-file objects. Downstream consumers of .tmp/all-results.json must handle the new format — the jq expressions in the YAML are updated, which is good.
  • plans/ artifact in a pipeline-fix PR: plans/sdlc-flow-skill/prompt-brief.md (120-line planning document for a future skill) is included in what is otherwise a pipeline and instruction-authoring PR — minor scope mixing, but harmless.
  • Hardcoded local filesystem path retained: docs/TESTING-PLUGINS.md still references /Users/isolomatov/Sources/GAIN/rosetta/ (a developer-local absolute path). Not introduced by this PR, but the file was touched — worth cleaning up in a follow-up.
  • Duplicate step number in init-workspace-flow.md: Two list items are both numbered 5 in the new verification section (steps 5 and 5).

Suggestions:

  • Add a PR description covering: (1) root cause of the broken pipeline, (2) summary of the new agent input/output contract, (3) migration notes for any existing callers.
  • Fix the duplicate step number 5 in init-workspace-flow.md verification section.
  • Consider removing or anonymizing the local absolute path in docs/TESTING-PLUGINS.md.

Automated triage by Rosetta agent

@github-actions
Copy link
Copy Markdown
Contributor

📋 Prompt Quality Validation Report

✅ Validation Passed

Summary by File

File 🔴 Critical 🟠 Very High 🟡 High 🔵 Medium ⚪ Low Status
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md 0 0 0 2 0 ⚠️ Warning
instructions/r2/core/workflows/coding-agents-prompting-flow.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/workflows/coding-flow.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/workflows/init-workspace-flow.md 0 0 0 1 0 ⚠️ Warning

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
The Rosetta trigger was condensed from the BASE 'If prompt is for rosetta itself (repo with target prompt is rosetta or RulesOfPower), ACQUIRE ...' to NEW 'If rosetta prompt, MUST ACQUIRE ...'. The strengthening to MUST is good, but the parenthetical '(repo with target prompt is rosetta or RulesOfPower)' that operationally defined what qualifies as a 'rosetta prompt' was deleted.
Reason:
pa-patterns.md's own ai-issues catalog warns 'AI removes important clarifiers, specifiers, explanations'. Removing the repo-scoping clarifier makes the trigger condition rely on the reader already knowing the definition; the term is still resolvable via the ACQUIREd pa-rosetta, so impact is limited, but the net change is roughly neutral rather than a clear gain.
Solution:
Keep the strengthened 'MUST ACQUIRE' but restore a brief scoping clarifier, e.g. 'If rosetta prompt (target repo is rosetta/RulesOfPower/cto-ims-kb), MUST ACQUIRE ...', or rely on the pa-rosetta repo list and keep it consistent.

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md

📊 Gates Comparison

Gate Score Comparison
Decision Branching 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better

✅ No Issues Found

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md

📊 Gates Comparison

Gate Score Comparison
Example Grounding 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better

⚠️ Issues Found

Severity Gate Details
🔵 Medium Bloat Control Problem:
A new '# Rosetta Principles' section (~13 bullets) was added that substantially duplicates docs/CONTEXT.md Design Philosophy (Agent-agnostic, Progressive disclosure, Classification-first, Rules-as-code, etc. appear in both).
Reason:
pa-hardening.md's own Root Cause Isolation flags 'Knowledge baked in (should be retrieved)'. Restating a large principle list in a reference file that is itself loaded into authoring context adds cognitive load and a second place to maintain, with limited incremental value over the existing canonical doc.
Solution:
Trim to the few principles that are uniquely needed for prompt-authoring decisions, or reference the canonical source instead of restating it inline, to avoid baking in knowledge that should be retrieved.
🔵 Medium Reference Integrity Problem:
The repo-name list was changed to 'Rosetta repo names are rosetta, cto-ims-kb, RulesOfPower.' The newly added cto-ims-kb appears nowhere else in the instructions tree and is inconsistent with the same skill's own SKILL.md line 99, which still enumerates the Rosetta repos as rosetta, RulesOfPower, instructions. Two canonical sources now disagree on which repos count as 'rosetta'.
Reason:
Divergent definitions of the trigger set ('which repos are rosetta') can cause an agent to apply or skip the pa-rosetta hardening path inconsistently depending on which file it read. Authoring decisions keyed on an unverified identifier risk being wrong.
Solution:
Confirm cto-ims-kb is a real Rosetta repo. If yes, propagate the same list to SKILL.md so the repo set is consistent across the skill; if not, remove it.

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md

📊 Gates Comparison

Gate Score Comparison
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 3 ⬇️ Slightly worse
Structural Coherence 5 ⬆️ Slightly better
Bloat Control 3 ⬇️ Slightly worse

⚠️ Issues Found

Severity Gate Details
🔵 Medium Cognitive Budget Problem:
Added prerequisite 'MUST load all skills as provided in phases when subagents are not used (thus better to use subagents).' The phrase 'load all skills as provided in phases' does not specify timing: it can be read as load-everything-upfront rather than per-phase just-in-time. On the no-subagent path there is no fresh-context mechanism, so an upfront reading front-loads multiple skill bodies into one context.
Reason:
The rule correctly closes a gap (skills previously could be skipped on the single-agent path), but the ambiguous timing risks context overload, which conflicts with the progressive-disclosure principle. Net change is still an improvement; this only refines it.
Solution:
Clarify timing, e.g. 'load each phase's skills when entering that phase (just-in-time) when subagents are not used', to keep the rule aligned with progressive disclosure.

📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better

⚠️ Issues Found

Severity Gate Details
🔵 Medium Cognitive Budget Problem:
Added bullet 'MUST load all skills as provided in phases when subagents are not used (thus better to use subagents).' 'Load all skills as provided in phases' is ambiguous about timing (all-upfront vs per-phase just-in-time); the upfront reading front-loads multiple skill bodies into a single context on the no-subagent path.
Reason:
The rule beneficially prevents skipping per-phase skills in single-agent execution, but the unspecified timing risks context overload contrary to the progressive-disclosure principle. The change remains a net improvement.
Solution:
Specify just-in-time loading, e.g. 'when subagents are not used, load each phase's skills as you enter that phase', to preserve progressive disclosure.

📄 instructions/r2/core/workflows/coding-flow.md

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better

⚠️ Issues Found

Severity Gate Details
🔵 Medium Workflow Completeness Problem:
The added verification steps introduce a duplicate step number: two consecutive steps are both labeled '5.' ('5. Request user to study https://...usage-guide/' and '5. Suggest examples for the next steps...'), and step '6' is skipped. The sequential numbering of the verification phase step list is broken.
Reason:
pa-hardening.md (new line) mandates 'Sequential activities use numbered list', and the new Rosetta Principles warn that AI tends to skip items in lists. A duplicate ordinal in a sequential phase raises the odds an agent merges the two distinct actions (study the usage guide; show slash-command examples) or drops one when executing the verification phase.
Solution:
Renumber the second '5.' to '6.' so the verification phase reads 1-2-3-4-5-6 sequentially.

📄 instructions/r2/core/workflows/init-workspace-flow.md

📊 Gates Comparison

Gate Score Comparison
Example Grounding 5 ✅ Much better

@github-actions
Copy link
Copy Markdown
Contributor

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File 🔴 Critical 🟠 Very High 🟡 High 🔵 Medium ⚪ Low Status
instructions/r2/core/agents/architect.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/planner.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/prompt-engineer.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/configure/cursor.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/configure/github-copilot.md 0 0 1 2 0 ❌ Fail
instructions/r2/core/skills/coding-agents-farm/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md 0 0 0 0 1 ⚠️ Warning
instructions/r2/core/skills/init-workspace-documentation/SKILL.md 0 1 1 0 0 ❌ Fail
instructions/r2/core/skills/init-workspace-patterns/SKILL.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/skills/planning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/reasoning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/research/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/agent-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/skill-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/adhoc-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/coding-agents-prompting-flow.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/workflows/coding-flow.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/workflows/init-workspace-flow-documentation.md 0 0 1 0 0 ❌ Fail
instructions/r2/core/workflows/init-workspace-flow.md 0 0 0 1 1 ⚠️ Warning
instructions/r2/core/workflows/research-flow.md 0 0 0 0 0 ✅ Pass

📄 instructions/r2/core/agents/architect.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/agents/planner.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/agents/prompt-engineer.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/configure/cursor.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
The model id was bumped to claude-4.8-opus-high-thinking but the human-readable description on the same line still reads "Anthropic Claude 4.7 Opus". Changed line: - claude-4.8-opus-high-thinking - Anthropic Claude 4.7 Opus (most capable, with extended reasoning, expensive). The id and its description now disagree on the version number (4.8 vs 4.7). (The same line is also the only Reference Integrity concern; the model id remains unique and selectable, so this is one defect — a stale description — not two.)
Reason:
An id/description mismatch in a model-catalog reference creates an ambiguous, self-contradictory entry that a reader or agent uses to choose a model; it also signals the bump was applied incompletely. The same PR correctly updated the parallel gpt-5.5-medium line from "Opus 4.7" to "Opus 4.8", confirming this line was missed.
Solution:
Update the description text to match the id: change "Anthropic Claude 4.7 Opus" to "Anthropic Claude 4.8 Opus" on that line.

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 3 ⬇️ Slightly worse

📄 instructions/r2/core/configure/github-copilot.md

⚠️ Issues Found

Severity Gate Details
🟡 High Reference Integrity Problem:
The bump collapsed two distinct catalog rows into the same selectable label. The two consecutive changed lines are now: - Claude Opus 4.8 - Anthropic Claude 4.7 Opus (most capable, with extended reasoning) and - Claude Opus 4.8 - Anthropic Claude 4.6 Opus prev gen. Two entries share the identical model label Claude Opus 4.8 while describing different generations (current vs prev gen). Before the bump they were distinct ids (Claude Opus 4.7 and Claude Opus 4.6). Runtime impact: with two identical labels the previous-generation Opus becomes UNADDRESSABLE (no unique label selects it), and any consumer building a label->model map silently drops or overwrites one row. A first-match harness likely lands on the correct current-gen model, so it rarely mis-selects, but the one-to-one label contract is broken.
Reason:
model for Copilot agents selects by this label string; a duplicated label makes selection ambiguous (which entry wins is undefined) and breaks the catalog's role as a unique id reference. This is a behavioral defect, not a stylistic preference.
Solution:
Restore distinct labels: keep one row as Claude Opus 4.8 for the current generation and revert the second (prev gen) row to its correct distinct id, e.g. Claude Opus 4.7 / Claude Opus 4.6, instead of duplicating Claude Opus 4.8. Remove or merge the duplicate so each label maps to exactly one model.
🔵 Medium Dependency Management Problem:
This doc parameterizes the model names that Rosetta agent frontmatter depends on. The duplicate Claude Opus 4.8 label means the catalog no longer exposes a one-to-one set of valid model identifiers, so downstream references cannot reliably resolve a single model from this label.
Reason:
The model catalog is the dependency contract for model selection; a colliding identifier weakens that contract.
Solution:
Keep each model identifier unique in the catalog so dependents (agent model fields) resolve unambiguously to one model.
🔵 Medium Precision & Explicitness Problem:
Both duplicated entries also carry stale descriptions that contradict the new label: Claude Opus 4.8 is described as "Anthropic Claude 4.7 Opus" on one line and "Anthropic Claude 4.6 Opus prev gen" on the next, so the label and its descriptions disagree on the version.
Reason:
Self-contradictory id/description pairs reduce precision in a reference table that agents read to choose a model, and the contradiction makes the duplicate harder to detect and fix.
Solution:
After de-duplicating the labels, update each description to match its id (4.8 description for the 4.8 entry, and the correct version for the reverted prev-gen entry).

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 2 ⬇️ Slightly worse
Reference Integrity 2 ⬇️ Slightly worse
Dependency Management 2 ⬇️ Slightly worse

📄 instructions/r2/core/skills/coding-agents-farm/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 5 ⬆️ Slightly better
Dependency Management 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Decision Branching 4 ⬆️ Slightly better
Precision & Explicitness 5 ⬆️ Slightly better
Reference Integrity 5 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Dependency Management 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 5 ⬆️ Slightly better
Example Grounding 5 ⬆️ Slightly better
Epistemic Honesty 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md

⚠️ Issues Found

Severity Gate Details
⚪ Low Epistemic Honesty Problem:
The PR adds the internal repo name cto-ims-kb to the Rosetta repo-identity list (Rosetta repo names are rosetta, cto-ims-kb, RulesOfPower.) and to the related ACQUIRE line in SKILL.md. The project's own CLAUDE.md states this is public OSS and to 'use Rosetta instead of KB, KnowledgeBase, IMS'.
Reason:
Low severity and possibly deliberate (these reference files are KB-internal, not the public website), but surfacing an internal repo name in a public-OSS repo is worth a quick intent check given Rosetta's stated IP-protection principle.
Solution:
Confirm cto-ims-kb is intended to be public. If it must be recognized as a real Rosetta repo, this is acceptable as-is; otherwise drop it or use a neutral name to align with the public-OSS naming and IP-protection guidance.

📊 Gates Comparison

Gate Score Comparison
Instruction Ordering 5 ⬆️ Slightly better
Workflow Completeness 5 ⬆️ Slightly better
Precision & Explicitness 5 ⬆️ Slightly better
Reference Integrity 5 ⬆️ Slightly better
Structural Coherence 5 ✅ Much better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-documentation/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟠 Very High Conflict Resolution Problem:
The skill model was downgraded from a high-capability model to a mid-tier one: model: claude-sonnet-4-6, gpt-5.4-high, gemini-3.1-pro-preview (was claude-opus-4-6). In the SAME PR the driving workflow init-workspace-flow-documentation.md was upgraded the other direction to subagent_recommended_model="claude-opus-4-8,gpt-5.5-high,gemini-3.1-pro-preview", and init-workspace-flow.md phase 6 recommends claude-opus-4-8, gpt-5.4-high, gpt-5.5-high. The skill and the two workflows that invoke it now disagree on the model for the identical documentation step (skill says sonnet, workflows say opus-4-8).
Reason:
This skill's role is recovers intent from code, not transcribes implementation — a reasoning-heavy synthesis task that benefits from the stronger model. The base aligned skill+workflow at opus-4-6; the PR breaks that alignment by moving them in opposite directions, creating a capability regression with a downstream conflict for the same step.
Solution:
Align the skill model field with the documentation phase recommendation (claude-opus-4-8 / gpt-5.5-high) unless there is a deliberate, documented reason to run this step cheaper; if cheaper is intended, also lower the workflow recommendation so all three agree.
🟡 High Dependency Management Problem:
Frontmatter line model: claude-sonnet-4-6, gpt-5.4-high, gemini-3.1-pro-preview is now inconsistent with the two workflow files that depend on this skill (init-workspace-flow-documentation.md and init-workspace-flow.md), both of which recommend claude-opus-4-8 for executing this exact skill.
Reason:
Cross-file model recommendations must agree so the agent does not pick a weaker model than the orchestrating workflow intends for the step.
Solution:
Treat the skill model and the workflow subagent_recommended_model as a single coupled value and keep them in sync whenever either is bumped.

📊 Gates Comparison

Gate Score Comparison
Conflict Resolution 2 ⬇️ Slightly worse
Dependency Management 2 ⬇️ Slightly worse

📄 instructions/r2/core/skills/init-workspace-patterns/SKILL.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Dependency Management Problem:
The skill model was downgraded claude-opus-4-6 -> claude-sonnet-4-6. This is internally consistent with its own workflow phase (init-workspace-flow.md phase 5 recommends sonnet in both base and new, so no cross-file conflict), but it is an undocumented capability reduction on a reasoning-heavy task (extracting and generalizing recurring patterns into reusable templates), and it moves opposite to the sibling init-workspace-documentation phase whose workflow was bumped to opus-4-8 in the same PR.
Reason:
Pattern extraction and documentation are the same class of intent-recovery work; splitting their model class without rationale yields uneven init output quality and signals a possibly accidental edit (one sibling's workflow recommendation was bumped, the other's was not).
Solution:
Adopt one model policy for the init reverse-engineering phases (documentation + patterns): either both target a strong model (opus-4-8) for quality or both target sonnet for cost, and record the rationale. Keep the SKILL.md frontmatter and the matching init-workspace-flow.md phase aligned.

📊 Gates Comparison

Gate Score Comparison
Conflict Resolution 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/planning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/reasoning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/research/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/agent-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Example Grounding 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/skill-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Example Grounding 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/adhoc-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
The new prerequisite #4 mixes a hard directive with an informal aside: "4. MUST load all skills as provided in phases when subagents are not used (thus better to use subagents)." The parenthetical "(thus better to use subagents)" is an opinion embedded inside a MUST rule. It is not a directive, has no actor or condition, and leaves unclear whether using subagents is recommended, required, or merely preferred, weakening the precision of the rule.
Reason:
A MUST line should state exactly one obligation with a clear condition. Folding a soft preference into it as a parenthetical blurs whether subagent use is mandatory and can cause an agent to deprioritize the actual skill-loading obligation.
Solution:
Split the rule from the recommendation. For example: "4. When subagents are not used, the orchestrator MUST load all skills required by each phase before executing it. Prefer subagents over inline execution to keep context lean."

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 3 ⬇️ Slightly worse

📄 instructions/r2/core/workflows/coding-flow.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
The new bullet folds a hard directive and an informal aside into one line: "- MUST load each phase's skills when entering that phase (just-in-time) when subagents are not used (thus better to use subagents)." The trailing parenthetical "(thus better to use subagents)" is a non-directive opinion inside a MUST rule, leaving it ambiguous whether inline execution is permitted or discouraged.
Reason:
Each MUST should carry one clear obligation. Embedding a soft preference as a parenthetical dilutes the directive and can lead an agent to treat the skill-loading rule as optional alongside the subagent suggestion.
Solution:
Separate the obligation from the recommendation, e.g.: "- When subagents are not used, MUST load each phase's skills just-in-time when entering that phase. Prefer subagents to keep context lean."

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 3 ⬇️ Slightly worse

📄 instructions/r2/core/workflows/init-workspace-flow-documentation.md

⚠️ Issues Found

Severity Gate Details
🟡 High Dependency Management Problem:
This phase's step 6.3 was bumped to recommend strong models for the documentation work: subagent_recommended_model="claude-opus-4-8,gpt-5.5-high,gemini-3.1-pro-preview". But in the same PR the skill it invokes (init-workspace-documentation/SKILL.md, acquired in step 6.2 and executed in 6.3) had its frontmatter model DOWNGRADED from claude-opus-4-6 to claude-sonnet-4-6. The workflow recommends a top-tier Claude model while the skill that actually does the work now recommends a mid-tier one, creating an inconsistent model expectation across the two coupled artifacts.
Reason:
The workflow phase and the skill it executes are a coupled dependency. Divergent model recommendations between them is a dependency-consistency defect: an operator following the workflow expects opus-class quality, but the executing skill self-recommends sonnet, so actual behavior may not match the workflow's stated expectation. Runtime angle: the divergence yields an undefined model class for the synthesis step depending on which signal the harness honors.
Solution:
Align the two. Either restore the skill frontmatter to a top-tier Claude model (e.g. claude-opus-4-8) to match this phase's recommendation, or lower this phase's claude recommendation to claude-sonnet-4-6 if the downgrade was intentional, so the workflow and the skill agree on the model class.

📊 Gates Comparison

Gate Score Comparison
Dependency Management 3 ⬇️ Slightly worse

📄 instructions/r2/core/workflows/init-workspace-flow.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Structural Coherence Problem:
The new step 6 embeds a fenced code block inside numbered list item 6. The fence and content sit at a consistent 3-space indent (correct for nesting under the 6. marker), but the blank separator lines are ragged: most are a single leading space ( ), the blank line before # Modernization is fully unindented, and the line before the closing fence carries trailing whitespace ( ). The inconsistent blank-line indentation and trailing whitespace inside the list-nested block can cause some markdown renderers to terminate or fail to reopen the fence at the unindented blank line, so the Modernization section may render as loose (un-fenced) text.
Reason:
Ragged blank-line indentation and trailing whitespace inside a list-nested fenced block is a known markdown ambiguity. An agent presenting this guidance to the user could show part of the example un-fenced, undermining the example it is meant to convey. The fence indent itself is fine; the whitespace inconsistency is the defect.
Solution:
Make all blank lines truly empty (no leading or trailing spaces) and keep every interior line at the same 3-space indent as the fence; remove the trailing whitespace before the closing fence so the block opens and closes unambiguously.
⚪ Low Bloat Control Problem:
Step 6 adds a ~30-line literal example block (three full slash-command sections with WHAT/WHY/FIRST/NOTE prose and multiple example invocations) directly inside the verification phase of an already large multi-phase workflow. The same value could be delivered by the usage-guide URL already added in step 5 ("Request user to study https://griddynamics.github.io/rosetta/docs/usage-guide/"), so the inline block partly duplicates external guidance and inflates the phase.
Reason:
The verification phase's job is to confirm completeness and point the user forward. A long embedded tutorial overlaps the just-added usage-guide link and adds cognitive load to the workflow file without adding orchestration logic. This is a minor bloat concern, not a blocker, given the examples are genuinely useful.
Solution:
Trim the embedded block to one or two short example invocations, or move the full multi-section examples into the linked usage guide and keep only a brief pointer in step 6, since step 5 already directs the user there.

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better
Structural Coherence 3 ⬇️ Slightly worse
Example Grounding 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/research-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

@github-actions
Copy link
Copy Markdown
Contributor

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File 🔴 Critical 🟠 Very High 🟡 High 🔵 Medium ⚪ Low Status
instructions/r2/core/agents/architect.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/planner.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/prompt-engineer.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/configure/cursor.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/configure/github-copilot.md 0 0 1 1 0 ❌ Fail
instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/skills/coding-agents-farm/SKILL.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/workflows/coding-agents-prompting-flow.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/skills/init-workspace-documentation/SKILL.md 0 0 1 0 0 ❌ Fail
instructions/r2/core/skills/init-workspace-patterns/SKILL.md 0 0 1 0 0 ❌ Fail
instructions/r2/core/skills/planning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/reasoning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/research/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/agent-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/skill-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/adhoc-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/coding-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/init-workspace-flow-documentation.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/init-workspace-flow.md 0 0 0 2 0 ⚠️ Warning
instructions/r2/core/workflows/research-flow.md 0 0 0 0 0 ✅ Pass

📄 instructions/r2/core/agents/architect.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/planner.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/prompt-engineer.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/configure/cursor.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
The model id was renamed to claude-4.8-opus-high-thinking but its description still reads "Anthropic Claude 4.7 Opus (most capable, with extended reasoning, expensive)". The id and its human-readable label now disagree on the version (4.8 vs 4.7).
Reason:
Inconsistent version labeling can mislead a human or agent choosing a model. It is low severity because the generator maps the id (claude-4.8-opus-high-thinking), not the prose, so model selection still resolves correctly. The sibling note on gpt-5.5-medium was correctly updated from "Opus 4.7" to "Opus 4.8", which makes the stale 4.7 label here stand out as an oversight.
Solution:
Update the description text on that line from "Claude 4.7 Opus" to "Claude 4.8 Opus" so the label matches the new id.

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 3 ⬇️ Slightly worse

📄 instructions/r2/core/configure/github-copilot.md

⚠️ Issues Found

Severity Gate Details
🟡 High Reference Integrity Problem:
Two model-catalog rows now read the identical id Claude Opus 4.8: "- Claude Opus 4.8 - Anthropic Claude 4.7 Opus (most capable...)" and "- Claude Opus 4.8 - Anthropic Claude 4.6 Opus prev gen". In BASE these were two distinct ids (Claude Opus 4.7 and Claude Opus 4.6), so the previous-generation entry is now lost and the catalog has a duplicate id.
Reason:
This file is the human/agent-facing catalog of valid Copilot model ids. A duplicate id with a vanished prev-gen entry misleads anyone consulting it to choose a model: value. (Note: the plugin generator's model maps are hardcoded in Python and do not parse this file, so model resolution is not corrupted — the impact is catalog accuracy, not build-time mapping.)
Solution:
Make the ids unique: keep one Claude Opus 4.8 row for the current generation and restore the previous-generation row with its real distinct id (e.g. Claude Opus 4.6/Claude Opus 4.5), so each catalog id appears once.
🔵 Medium Precision & Explicitness Problem:
The two duplicate Claude Opus 4.8 rows carry contradictory descriptions: one says "Anthropic Claude 4.7 Opus (most capable...)" and the other "Anthropic Claude 4.6 Opus prev gen". The same id is described as both current-most-capable and previous-generation, and the prose still says 4.7/4.6 while the id says 4.8.
Reason:
Contradictory descriptions for one id give a reader conflicting guidance about what that model is, undermining reliable model choice.
Solution:
After deduplicating, align each row's description with its id version (e.g. Claude Opus 4.8 -> "Anthropic Claude 4.8 Opus (most capable...)") and drop the stale 4.7/4.6 prose.

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 3 ⬇️ Slightly worse
Reference Integrity 3 ⬇️ Slightly worse

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Decision Branching Problem:
The changed bullet now reads If prompt is for rosetta itself, MUST ACQUIRE ... pa-rosetta.md. The base version carried the concrete trigger condition (repo with target prompt is rosetta or RulesOfPower) which was deleted. The new line keeps the branch but removes how the reviewer decides the branch applies.
Reason:
Without the trigger, the if/then branch is harder to evaluate, so a reviewer may not know when the MUST applies.
Solution:
Restore a short trigger cue for when 'prompt is for rosetta itself' is true (for example list the repo names rosetta / cto-ims-kb / RulesOfPower), or point to where that condition is defined.

📊 Gates Comparison

Gate Score Comparison
Decision Branching 3 ⬇️ Slightly worse
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
New step 5 says AI coding agent makes a decision, plans execution flow, but step 3 already says to select the best matching workflow. The word 'decision' now appears at two stages with no distinction of what each decides.
Reason:
Two undistinguished 'decision' points violate one-term-per-concept and make the procedure ambiguous about where the choice actually happens.
Solution:
Use one term per concept: keep the workflow-selection decision in step 3, and label step 5 as 'plan execution within the loaded workflow' to remove the duplicate, undefined 'makes a decision'.

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-farm/SKILL.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
The frontmatter model list now reads claude-4.8-opus-high, gpt-5.5-high, adding an OpenAI option. But the body 'Model selection guidance' (OpenAI: gpt-5.4, gpt-5.3-codex-high) and all launch examples (Codex -m gpt-5.4-medium) never mention gpt-5.5, and the body explicitly warns 'Use ONLY the names and flags listed in this skill. Do NOT substitute from memory.'. A spawned-CLI selection of gpt-5.5 has no supporting name in the body.
Reason:
The skill orders agents to use only names listed in the body; introducing gpt-5.5 only in frontmatter creates a small gap between the agent's declared model and the names the body permits.
Solution:
Either add gpt-5.5/gpt-5.5-high to the OpenAI 'Model selection guidance' line and the Codex launch example, or note that the frontmatter model applies to the Farm Leader itself and is independent of the farmed-CLI model list.

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
Added prerequisite 4: MUST load all skills as provided in phases when subagents are not used (thus better to use subagents).. The phrase 'all skills as provided in phases' is vague; the phases here name subagents and roles, not an explicit per-phase skill list, so it is unclear which skills the orchestrator must load when running without subagents.
Reason:
An MUST instruction that points at an undefined set is hard to satisfy reliably and may be skipped or guessed at by the orchestrator.
Solution:
Name the concrete skills to load in the non-subagent path (at minimum coding-agents-prompt-authoring per prerequisite 3) or point to where each phase's required skills are listed, so 'all skills as provided in phases' resolves to a definite set.

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-documentation/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟡 High Dependency Management Problem:
The frontmatter model list changed from claude-opus-4-6, gpt-5.4-high, gemini-3.1-pro-preview to claude-sonnet-4-6, gpt-5.4-high, gemini-3.1-pro-preview. Because the plugin generator picks the FIRST model per family, the Claude plugin now runs this skill on Sonnet instead of Opus. This skill is synthesis-heavy: it recovers intent from code, writes five complementary docs (CONTEXT, ARCHITECTURE, IMPLEMENTATION, ASSUMPTIONS, AGENT MEMORY), reverse-engineers domain, and tracks unknowns. The OpenAI family kept a high-reasoning model (gpt-5.4-high), so the downgrade is Claude-only and makes the recommended capability inconsistent across families for the same task.
Reason:
A first-in-list model downgrade to a less capable model for a synthesis-heavy reverse-engineering task is a behavioral capability consideration and creates cross-family inconsistency for the same workload.
Solution:
Confirm the Sonnet downgrade is an intentional cost choice. If so, leave it; if the goal is to keep deep synthesis quality, restore an Opus-class first model for the Claude family (e.g. claude-opus-4-8) so it matches the high-reasoning OpenAI choice. Note this is a recommended-model field, not a hard contract.

📊 Gates Comparison

Gate Score Comparison
Dependency Management 3 ⬇️ Slightly worse

📄 instructions/r2/core/skills/init-workspace-patterns/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟡 High Dependency Management Problem:
The frontmatter model list changed from claude-opus-4-6, gpt-5.4-high, gemini-3.1-pro-preview to claude-sonnet-4-6, gpt-5.4-high, gemini-3.1-pro-preview. Since the generator picks the first model per family, the Claude plugin now uses Sonnet rather than Opus. This skill extracts recurring architectural patterns and abstracts them into generalizable templates with extension points — an abstraction/synthesis task. The OpenAI family kept gpt-5.4-high (high reasoning), so only Claude is downgraded, leaving the recommended capability uneven across families.
Reason:
Downgrading the first-listed model for a pattern-abstraction task is a behavioral capability consideration and produces cross-family inconsistency for identical work.
Solution:
Verify the Sonnet choice is a deliberate cost optimization. If deep pattern abstraction quality matters, restore an Opus-class first model for Claude (e.g. claude-opus-4-8) to match the high-reasoning OpenAI selection. This is a recommended-model field, not a hard contract.

📊 Gates Comparison

Gate Score Comparison
Dependency Management 3 ⬇️ Slightly worse

📄 instructions/r2/core/skills/planning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/reasoning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/research/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/agent-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/skill-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/adhoc-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/coding-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/init-workspace-flow-documentation.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/workflows/init-workspace-flow.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Structural Coherence Problem:
The step 6 content is wrapped in a plain triple-backtick code fence, so its internal Markdown headers (# Coding Workflow) and bold labels render as literal raw text rather than structure. Mixing example slash-commands, prose guidance, and section headers inside one undifferentiated fence makes the additions less atomic than the surrounding numbered, tagged phases.
Reason:
The rest of the workflow uses clean numbered steps and phase tags; a mixed-content fenced blob breaks that pattern and is harder to scan, though it does not break behavior.
Solution:
Either drop the code fence and present the examples as plain quoted command lines under sub-bullets, or keep the fence but remove the embedded Markdown headers/bold so the block is purely example commands. Keep one concern per block.
🔵 Medium Bloat Control Problem:
Phase 8 step 6 adds an inline fenced block of about 30 lines of multi-flow example commands (coding, requirements, modernization), including prose headers (# Coding Workflow), WHAT/WHY/FIRST/NOTE labels, and ellipsis placeholders. This is onboarding/usage content embedded inside an orchestration workflow whose job is to verify completeness and suggest next steps. The same guidance already lives behind the usage-guide link added in step 5 (https://griddynamics.github.io/rosetta/docs/usage-guide/), so there is partial duplication.
Reason:
An execution workflow should stay dense and action-only. A large literal example block inflates the file and overlaps with the usage-guide link, lowering signal density for the agent running the phase.
Solution:
Trim the inline block to one short example per flow (or move the long examples to the linked usage guide and keep only the link plus a one-line pointer in step 6). Avoid restating modernization phase policy (FIRST/NOTE) here since that belongs to modernization-flow.

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Bloat Control 3 ⬇️ Slightly worse

📄 instructions/r2/core/workflows/research-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

@github-actions
Copy link
Copy Markdown
Contributor

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File 🔴 Critical 🟠 Very High 🟡 High 🔵 Medium ⚪ Low Status
instructions/r2/core/agents/architect.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/planner.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/prompt-engineer.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/configure/cursor.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/configure/github-copilot.md 0 0 0 0 1 ⚠️ Warning
instructions/r2/core/templates/shell-schemas/agent-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/skill-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/planning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/reasoning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/research/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-farm/SKILL.md 0 0 0 1 1 ⚠️ Warning
instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md 0 0 1 0 0 ❌ Fail
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md 0 0 1 1 1 ❌ Fail
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md 0 0 1 2 1 ❌ Fail
instructions/r2/core/skills/requirements-authoring/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/rules/bootstrap-rosetta-files.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/coding-agents-prompting-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/init-workspace-documentation/SKILL.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/skills/init-workspace-patterns/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/init-workspace-verification/SKILL.md 0 0 1 3 2 ❌ Fail
instructions/r2/core/workflows/init-workspace-flow-documentation.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/workflows/init-workspace-flow-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/init-workspace-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/coding-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/adhoc-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/research-flow.md 0 0 0 0 0 ✅ Pass

📄 instructions/r2/core/agents/architect.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/planner.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/prompt-engineer.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/configure/cursor.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/configure/github-copilot.md

⚠️ Issues Found

Severity Gate Details
⚪ Low Bloat Control Problem:
The reworded line - \\Claude Opus 4.6\ - Anthropic Claude 4.6 Opus prev gen (4.7 existed but was not good) adds a non-operational version-history aside into a model reference list where every other entry follows a uniform <ModelName> - <Capability> format.
Reason:
Operational model lists are read by agents to pick a model. The historical note adds tokens without changing selection behavior and breaks the list convention. Low severity — no behavioral or safety impact.
Solution:
Drop the parenthetical so the entry reads - \\Claude Opus 4.6\ - Anthropic Claude 4.6 Opus prev gen. If the history note is wanted, place it in a changelog, not inline in the selection list.

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/agent-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/skill-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/planning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/reasoning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/research/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-farm/SKILL.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Precision & Explicitness Problem:
Frontmatter now lists model: claude-4.8-opus-high, gpt-5.5-high, but the body "Model selection guidance" OpenAI row still reads only gpt-5.4 (workhorse), gpt-5.3-codex-high (agentic, complex). The newly declared gpt-5.5-high is never mentioned in the guidance, so an agent gets no signal on when to pick it.
Reason:
Frontmatter and body disagree on which OpenAI models exist, creating a silent inconsistency that weakens model-selection guidance.
Solution:
Add gpt-5.5-high to the OpenAI guidance row with a short usage note (e.g. complex, high-reasoning), matching the frontmatter.
⚪ Low Dependency Management Problem:
The Anthropic body guidance row was updated (claude-opus-4-6 → claude-opus-4-8) but the OpenAI row was not synchronized with the frontmatter's new gpt-5.5-high. Model dependencies are only partially aligned across the file.
Reason:
Partial sync leaves the file internally inconsistent; same root cause as the Precision finding, low severity.
Solution:
Synchronize all in-body model references with the frontmatter model list whenever the frontmatter changes.

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟡 High Reference Integrity Problem:
The acquire line now reads repos: rosetta, cto-ims-kb, RulesOfPower, instructions folder. The repo cto-ims-kb is an internal/private organization repo that does not exist anywhere else in this public OSS repository. External users reading this will be told their Rosetta prompts apply to a repo they do not have.
Reason:
A private repo name leaked into a public OSS instruction is incorrect for external users and exposes internal naming. This occurrence is a propagated copy of the pa-rosetta.md change.
Solution:
Remove cto-ims-kb from the parenthetical here. If internal builds need it, inject it via an org-layer override of pa-rosetta.md rather than baking it into the public SKILL.md. This mirrors the canonical pa-rosetta.md fix.

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md

⚠️ Issues Found

Severity Gate Details
🟡 High Conflict Resolution Problem:
A new bullet tells the author to compress text "including but not limited to using unicode characters and icons", while the same added block also enforces "Coding-agent-agnostic", and the same PR adds grep-friendly header rules elsewhere. Unicode and icons can break grep and may not render consistently across Cursor, Copilot, JetBrains, and plain-text pipelines. No priority is set between these two new rules. Because this guidance runs on every future hardening pass, an agent may insert unicode/icons into machine-readable sections (headers, numbered lists, rule lines) of existing Rosetta prompts.
Reason:
An unresolved conflict between two same-file rules will be applied systemically to all prompts reviewed with this skill, degrading greppability and cross-agent portability — a broad blast radius.
Solution:
Scope the unicode/icons suggestion to user-facing display text only and explicitly exclude grep-indexed headers, numbered checklists, and machine-readable rule lines. Or limit compression to plain-text techniques (short phrases, acronyms, markers).
🔵 Medium Precision & Explicitness Problem:
The trigger condition changed from If prompt is for rosetta itself (repo with target prompt is rosetta or RulesOfPower) to If prompt is for rosetta itself. The explicit repo list that operationalized "rosetta itself" was removed, so the condition now relies on inferring scope from another file (pa-rosetta.md).
Reason:
Removing the concrete operationalization makes the condition vaguer and forces a cross-file lookup to decide when the rule fires.
Solution:
Restore an inline repo cue or add (see pa-rosetta.md for repo list) so the trigger stays self-contained.
⚪ Low Reference Integrity Problem:
By pointing to pa-rosetta.md (which now lists the private cto-ims-kb) without naming repos itself, this trigger implicitly inherits the private-repo reference.
Reason:
Same private-repo root cause as pa-rosetta.md, inherited here; low severity since it is a propagation, not an independent defect.
Solution:
Resolving the canonical pa-rosetta.md occurrence of cto-ims-kb resolves this automatically; keep the trigger's in-scope repos explicit for OSS users.

📊 Gates Comparison

Gate Score Comparison
Conflict Resolution 3 ⬇️ Slightly worse
Workflow Completeness 4 ⬆️ Slightly better
Reference Integrity 3 ⬇️ Slightly worse

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md

⚠️ Issues Found

Severity Gate Details
🟡 High Reference Integrity Problem:
The canonical repo list changed from rosetta, RulesOfPower to rosetta, cto-ims-kb, RulesOfPower. cto-ims-kb is a private/internal org repo introduced by this PR; it appears nowhere else in this public OSS repo (only in the files this PR touched and their generated plugin mirrors).
Reason:
This is the canonical source of the private-repo leak that SKILL.md and pa-hardening.md inherit. A private name in public OSS is wrong for external users and exposes internal naming.
Solution:
Remove cto-ims-kb from this public OSS file. If internal builds need it in scope, inject it via an org-layer override (e.g. an org-specific copy of pa-rosetta.md) following the layered customization pattern, not by hardcoding it in core.
🔵 Medium Epistemic Honesty Problem:
Step 2 now states AI loads few more skills based on skill description only (usually only 1-2). The usually only 1-2 is a quantified runtime assumption presented as a factual description of how Rosetta triggers.
Reason:
Hard numbers stated as fact in a meta-prompting reference can mislead authors into designing around a count that is not guaranteed.
Solution:
Soften to a small number of additional skills based on descriptions to avoid presenting an assumption as an architectural guarantee.
🔵 Medium Bloat Control Problem:
A new # Rosetta Principles section adds ~13 design-philosophy bullets (Progressive Disclosure, Classification-First, Agent-Agnostic, etc.) that largely overlap Rosetta's design-philosophy docs (CONTEXT.md / ARCHITECTURE.md / OVERVIEW.md).
Reason:
Duplicating the design philosophy in a just-in-time reference grows the file and creates drift risk when principles evolve elsewhere.
Solution:
Trim to principles unique to prompt authoring, or point the meta-prompt engineer to the design-philosophy docs, to avoid a second copy that can drift.
⚪ Low Cognitive Budget Problem:
The file now spans four domains in one reference: load procedure, folder structure, command aliases, and design principles, increasing the surface a prompt author must process.
Reason:
Same added content as the Bloat finding viewed from a load-size lens; the new section headers do aid navigation, so impact is low.
Solution:
Consider moving Rosetta Principles into a separate reference loaded only when needed, keeping pa-rosetta.md focused on operational context.

📊 Gates Comparison

Gate Score Comparison
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Reference Integrity 3 ⬇️ Slightly worse
Structural Coherence 4 ⬆️ Slightly better
Epistemic Honesty 3 ⬇️ Slightly worse
Bloat Control 3 ⬇️ Slightly worse
Cognitive Budget 3 ⬇️ Slightly worse

📄 instructions/r2/core/skills/requirements-authoring/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 5 ⬆️ Slightly better
Structural Coherence 5 ⬆️ Slightly better

📄 instructions/r2/core/rules/bootstrap-rosetta-files.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 5 ⬆️ Slightly better

📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 5 ⬆️ Slightly better
Precision & Explicitness 5 ⬆️ Slightly better
Dependency Management 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-documentation/SKILL.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Conflict Resolution Problem:
The IMPLEMENTATION.md guidance now holds two bullets at once: Current state only VERY BRIEFLY and High-level change log, each change separate header with date and description. "Current state only" implies no history; a "change log" is accumulated history. The base bullet no change log was removed, but Current state only was left in place, so the two signals conflict and no rule says how to reconcile them.
Reason:
Mixed signals make agents produce inconsistent IMPLEMENTATION.md structures across sessions — some keep it lean, others grow a log that violates "VERY BRIEFLY". The change-log ownership is coordinated with bootstrap-rosetta-files.md, but the internal wording tension is unresolved.
Solution:
Reword Current state only VERY BRIEFLY to accommodate both, e.g. Brief current-state section plus a high-level change log (one header per change, with date and description). This keeps the intended single change-log location without the contradictory "only" wording.

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 3 ⬇️ Slightly worse
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-patterns/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 5 ✅ Much better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 5 ✅ Much better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-verification/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟡 High Precision & Explicitness Problem:
The added line DEMAND user (and orchestrator to demand too) to study ... directs the skill to prescribe orchestrator behavior. A skill has no awareness of which orchestrator invokes it, so it cannot "demand" from the orchestrator. In orchestrated runs this directive can also propagate up the call stack and be re-issued by the parent.
Reason:
This breaks the skill-boundary contract (a skill must not prescribe its orchestrator) and risks a demand feedback loop in orchestrated execution.
Solution:
Replace with Present to user: [content] scoped to the user only, and remove and orchestrator to demand too. Encode any orchestrator-level handoff in the workflow phase file instead.
🔵 Medium Structural Coherence Problem:
The DEMAND block sits inside <process> after the final --- separator and after the DEPRECATED ARTIFACTS section, untagged. It reads as an unintegrated epilogue rather than a numbered process step.
Reason:
Placement after the logical end of the process makes it ambiguous whether the content is part of the verification steps.
Solution:
If retained, wrap it in a distinct section (e.g. a handoff section) or move it to the phase file. Do not append substantive content untagged after the process concludes.
🔵 Medium Bloat Control Problem:
A ~33-line fenced markdown block (Coding Workflow, Requirements, Modernization) duplicates content already at the referenced usage-guide URL and embeds full workflow descriptions inside an audit skill.
Reason:
The duplicated block inflates the skill and will drift from the canonical usage guide.
Solution:
Compress to a single pointer such as Walk the user through the USAGE_GUIDE slash-command examples plus the reference, dropping the duplicated workflow prose.
🔵 Medium Single Responsibility Problem:
The appended DEMAND + fenced usage-examples block adds user-onboarding/next-steps content to a skill whose stated purpose is a completeness audit and catch-up. Teaching users which slash-commands to run next is a second, unrelated responsibility.
Reason:
Mixing onboarding into an audit skill violates SRP and makes the skill harder to reuse on its own.
Solution:
Move the handoff block to the workflow phase file (init-workspace-flow-verification) or a dedicated onboarding artifact. Keep this skill focused on audit and catch-up.
⚪ Low Dependency Management Problem:
The added line hardcodes a live external URL (https://griddynamics.github.io/rosetta/docs/usage-guide/) as a dependency inside the skill.
Reason:
External URLs are fragile; the page can move or restructure with no signal. Low severity since it is the canonical public docs site for this OSS repo.
Solution:
Reference the local USAGE_GUIDE by name/alias, or note that the URL must be verified each release so the dependency stays trackable.
⚪ Low Reference Integrity Problem:
The fenced examples reference sibling workflows /coding-flow, /requirements-authoring-flow, /modernization-flow inside a skill that should be sibling-unaware. They are framed as user examples in a fenced block, lowering the risk an agent treats them as actionable.
Reason:
Lateral workflow references break the no-sibling-awareness boundary; low severity because the fenced framing reduces auto-invocation risk. Same root cause as the SRP/Precision findings.
Solution:
If kept, prefix the examples as user-facing suggestions only — do not invoke, or relocate them to the phase/handoff file.

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Example Grounding 5 ✅ Much better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 5 ✅ Much better
Cognitive Budget 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/init-workspace-flow-documentation.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Success Criteria Problem:
The validation checklist changed from All 5 doc files exist and are non-empty to All doc files exist and are non-empty. Removing the count makes the check open-ended: an agent that created only 3 or 4 files can still satisfy "all doc files exist". This is most dangerous in partial-init runs where this phase check is the only gate (the downstream verification skill's explicit file list runs only in full init).
Reason:
The count is load-bearing and still asserted elsewhere — this same file's description says "Proof: five doc files exist" and the skill says it creates five docs — so the weakened checklist removes a detectable failure signal and is inconsistent with its siblings.
Solution:
Restore the count and enumerate the set: All 5 doc files exist and are non-empty (CONTEXT.md, ARCHITECTURE.md, IMPLEMENTATION.md, ASSUMPTIONS.md, AGENT MEMORY.md).

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/init-workspace-flow-patterns.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/init-workspace-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/coding-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 5 ✅ Much better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 5 ✅ Much better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/adhoc-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/research-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Single Responsibility 4 ⬆️ Slightly better
Input Contract 4 ⬆️ Slightly better
Output Contract 4 ⬆️ Slightly better
Success Criteria 4 ⬆️ Slightly better
Conflict Resolution 4 ⬆️ Slightly better
Decision Branching 4 ⬆️ Slightly better
Instruction Ordering 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Safety Boundaries 4 ⬆️ Slightly better
Failure Handling 4 ⬆️ Slightly better
Epistemic Honesty 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 4 ⬆️ Slightly better
Cognitive Budget 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

Signed-off-by: isolomatov-gd <isolomatov@griddynamics.com>
@isolomatov-gd isolomatov-gd merged commit a7ad41c into main May 29, 2026
1 check failed
@github-actions
Copy link
Copy Markdown
Contributor

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File 🔴 Critical 🟠 Very High 🟡 High 🔵 Medium ⚪ Low Status
instructions/r2/core/skills/coding-agents-farm/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/coding-agents-prompting-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/coding-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/init-workspace-documentation/SKILL.md 0 1 3 1 0 ❌ Fail
instructions/r2/core/skills/init-workspace-patterns/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/init-workspace-verification/SKILL.md 0 1 1 1 0 ❌ Fail
instructions/r2/core/workflows/init-workspace-flow-documentation.md 0 2 2 1 0 ❌ Fail
instructions/r2/core/workflows/init-workspace-flow-patterns.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/init-workspace-flow-verification.md 0 0 1 1 0 ❌ Fail
instructions/r2/core/workflows/init-workspace-flow.md 0 0 1 0 0 ❌ Fail
instructions/r2/core/agents/architect.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/planner.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/agents/prompt-engineer.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/planning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/reasoning/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/requirements-authoring/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/skills/research/SKILL.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/configure/cursor.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/configure/github-copilot.md 0 0 0 1 0 ⚠️ Warning
instructions/r2/core/rules/bootstrap-rosetta-files.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/agent-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/templates/shell-schemas/skill-shell.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/adhoc-flow.md 0 0 0 0 0 ✅ Pass
instructions/r2/core/workflows/research-flow.md 0 0 0 0 0 ✅ Pass

📄 instructions/r2/core/skills/coding-agents-farm/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 5 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-patterns.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Instruction Ordering 4 ⬆️ Slightly better
Structural Coherence 5 ⬆️ Slightly better

📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Decision Branching 4 ⬆️ Slightly better
Workflow Completeness 5 ⬆️ Slightly better

📄 instructions/r2/core/workflows/coding-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Workflow Completeness 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-documentation/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟠 Very High Goal Specification Problem:
The now creates TODO.md (new) plus the pre-existing README.md, but the frontmatter description and <when_to_use_skill> still say the skill creates only five docs ('CONTEXT.md, ARCHITECTURE.md, IMPLEMENTATION.md, ASSUMPTIONS.md, and AGENT MEMORY.md' / 'creates five foundational docs ... all five docs exist'). The stated objective no longer matches the actual outputs.
Reason:
An agent reading the description/when_to_use may skip TODO.md or treat 'all five exist' as the success bar, leaving the new file uncreated.
Solution:
Update the frontmatter description and when_to_use_skill text to reflect the full set of files now produced (add TODO.md, and account for README.md) so the count is consistent.
🟡 High Conflict Resolution Problem:
IMPLEMENTATION.md guidance now mandates a two-part 'Baseline' + 'dated high-level change log' structure and says not to use the word 'current' (misleading), yet downstream consumers (init-workspace-verification checkpoint 6) still describe the same file as 'current state'. The producing skill does not state the old 'current state' framing is retired.
Reason:
Producer now requires a change-log section the verifier still tests as plain current-state, so the same file is described two incompatible ways across the flow.
Solution:
State explicitly that IMPLEMENTATION.md has exactly two sections (Baseline, then dated change log) and that 'current state' phrasing is retired, so consuming checkpoints can be aligned.
🟡 High Success Criteria Problem:
The when_to_use completion bar and validation_checklist still target the old five-doc set, but the process now also produces TODO.md and redefines IMPLEMENTATION.md to need a Baseline section plus a dated change log. Neither TODO.md existence nor the IMPLEMENTATION two-part structure is in the success conditions.
Reason:
If success criteria say 'all five docs exist', an agent stops after five and marks done, silently skipping TODO.md and the new change-log section.
Solution:
Add TODO.md to the success bar and validation_checklist, and add a checkable item for the IMPLEMENTATION.md Baseline + dated change-log structure.
🟡 High Output Contract Problem:
IMPLEMENTATION.md now must contain a 'High-level change log, each change separate header with date and description'. This is the first place in the init flow that defines IMPLEMENTATION.md as carrying dated change entries, but neither <when_to_use_skill> nor the validation_checklist mentions verifying the baseline-vs-changelog two-part structure.
Reason:
Without a self-check, the agent may write only the baseline (as before) and silently drop the new change-log requirement.
Solution:
Add a validation_checklist item confirming IMPLEMENTATION.md has the Baseline section plus the dated change-log section so the new contract is checkable.
🔵 Medium Bloat Control Problem:
Every one of the seven doc blocks now repeats two near-identical lines: 'What this doc is for and what it should contain, self-defining style' and 'Self-defines purpose, content type, style'. These say the same thing twice per block, fourteen near-duplicate lines total.
Reason:
Redundant repetition inflates the prompt and dilutes the distinct per-doc instructions that actually differ.
Solution:
Collapse the two near-duplicate self-definition lines ('What this doc is for...' and 'Self-defines purpose, content type, style') into one line per doc block; per-block self-definition is intentional, the duplication is not.

📊 Gates Comparison

Gate Score Comparison
Goal Specification 3 ⬇️ Slightly worse
Output Contract 3 ⬇️ Slightly worse
Conflict Resolution 3 ⬇️ Slightly worse
Precision & Explicitness 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better
Bloat Control 3 ⬇️ Slightly worse
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-patterns/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Output Contract 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Example Grounding 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/init-workspace-verification/SKILL.md

⚠️ Issues Found

Severity Gate Details
🟠 Very High Workflow Completeness Problem:
The checklist was renumbered to 28 items and new checks (item 16 'INDEX.md is consistent', item 20 'greppable headers used in all files') were added, but the FILE EXISTENCE block (items 1-9) was NOT updated to include TODO.md, which the init-workspace-documentation skill now creates. The verification phase therefore never confirms TODO.md exists.
Reason:
A new required output (TODO.md) can be silently skipped because the completeness audit has no checkpoint for it.
Solution:
Add a FILE EXISTENCE checkpoint for TODO.md so the renumbered audit covers every file the documentation phase now produces.
🟡 High Conflict Resolution Problem:
Checkpoint 6 still reads 'IMPLEMENTATION.md - current state, DRY references', but the documentation skill now forbids the word 'current' (calls it misleading) and mandates a Baseline section plus a dated change log. An agent that built IMPLEMENTATION.md per the new instruction is checked against a stale criterion describing the old single 'current state' shape.
Reason:
Producer skill and verifier disagree on the same file's required shape, making the gate non-deterministic — a correct file may be flagged or a non-conforming one waved through.
Solution:
Update checkpoint 6 to verify the Baseline section plus the dated change-log structure, removing the word 'current'.
🔵 Medium Success Criteria Problem:
Checkpoint 6 still reads 'IMPLEMENTATION.md — current state, DRY references' while the documentation skill now explicitly forbids the word 'current' (says it is misleading) and requires a Baseline section plus a dated change log. The verification criterion no longer matches what the file must contain.
Reason:
The completion check tests for a structure the producing skill was told not to use, so a correctly built IMPLEMENTATION.md could be flagged or a wrong one passed.
Solution:
Update checkpoint 6 to verify the Baseline section and the dated change-log structure instead of 'current state'.

📊 Gates Comparison

Gate Score Comparison
Success Criteria 3 ⬇️ Slightly worse
Conflict Resolution 3 ⬇️ Slightly worse
Workflow Completeness 3 ⬇️ Slightly worse
Precision & Explicitness 4 ⬆️ Slightly better
Structural Coherence 4 ⬆️ Slightly better
Self-Validation 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/init-workspace-flow-documentation.md

⚠️ Issues Found

Severity Gate Details
🟠 Very High Output Contract Problem:
The phase Output (workflow_context) and update_state inventory list exactly five files (CONTEXT, ARCHITECTURE, IMPLEMENTATION, ASSUMPTIONS, AGENT MEMORY), but the validation_checklist now demands 'All 7 doc files exist'. The two extra files are never named anywhere in this phase.
Reason:
An output contract that promises 7 files but names only 5 cannot be relied on by update_state or the verification phase; the new files get dropped from the inventory and propagate as missing downstream.
Solution:
Enumerate every file the skill now produces (add TODO.md and README.md) in the Output and update_state inventory so the named outputs match the 7-count, or revert the checklist to the enumerated set.
🟠 Very High Success Criteria Problem:
The validation_checklist was changed to 'All 7 doc files exist and are non-empty', but the same file's <description_and_purpose> still says 'five doc files exist', and <workflow_context> Output and <update_state> step 6.4 both still list only the five files (CONTEXT, ARCHITECTURE, IMPLEMENTATION, ASSUMPTIONS, AGENT MEMORY). The 7-count has no corresponding enumerated outputs in this file, so the agent cannot tell which two extra files are meant.
Reason:
A success bar of '7 files' with only 5 named files is unverifiable; the agent guesses which files satisfy the check, causing inconsistent results.
Solution:
Make the count consistent within the file: update description, Output list, and update_state to enumerate all files now produced (add TODO.md and README.md), or revert the checklist to the actual enumerated set.
🟡 High Workflow Completeness Problem:
update_state step 6.4 instructs the agent to update the file inventory only for the five original files, so the TODO.md the skill now creates is never recorded in state, leaving an implicit untracked step.
Reason:
A file created but not inventoried in state is treated as missing by the verification phase that reads the same state file.
Solution:
Add TODO.md (and README.md where tracked) to the step 6.4 inventory-update list so state reflects every file produced.
🟡 High Goal Specification Problem:
description_and_purpose and the frontmatter description still state the phase produces five docs and use 'Proof: five doc files exist', contradicting the same file's '7 doc files' checklist and the skill it invokes.
Reason:
The phase's stated objective undercounts its deliverables, so a reader treats five files as the goal and never produces the rest.
Solution:
Align the description and Proof statement with the actual file set produced by the documentation skill (add TODO.md, account for README.md).
🔵 Medium Precision & Explicitness Problem:
The added sentence 'this documentation will be loaded every signle time in every single user session' contains a typo ('signle') and the broader added clause duplicates model-selection guidance already carried in the subagent_recommended_model attribute.
Reason:
Typo plus duplicated rationale add noise without improving the instruction.
Solution:
Fix the typo and drop the redundant prose justification, relying on subagent_recommended_model for model choice.

📊 Gates Comparison

Gate Score Comparison
Goal Specification 3 ⬇️ Slightly worse
Output Contract 2 ⬇️ Slightly worse
Success Criteria 2 ⬇️ Slightly worse
Workflow Completeness 3 ⬇️ Slightly worse
Precision & Explicitness 3 ⬇️ Slightly worse

📄 instructions/r2/core/workflows/init-workspace-flow-patterns.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Goal Specification 4 ⬆️ Slightly better
Precision & Explicitness 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/init-workspace-flow-verification.md

⚠️ Issues Found

Severity Gate Details
🟡 High Reference Integrity Problem:
Step 8.3 item 3 uses reversed markdown link syntax: '(USAGE GUIDE)[https://griddynamics.github.io/rosetta/docs/usage-guide/]'. The text and URL are swapped — correct markdown is 'text'. Because this block is explicitly a user-facing message the agent is told to DEMAND the user study, the broken link will render as literal text, not a clickable link.
Reason:
Reversed link syntax renders unusably in the user-facing output, undermining the 'study the usage guide' demand.
Solution:
Swap to 'USAGE GUIDE'.
🔵 Medium Bloat Control Problem:
Step 8.3 item 4 embeds a large fenced markdown block (Coding/Requirements/Modernization examples with many slash-command samples) inline in the phase prompt. This sizeable user-facing block lives in a verification phase whose job is auditing, inflating the phase file every time it loads.
Reason:
A long embedded user-facing block enlarges the verification phase context for content used only once at the very end.
Solution:
Move the examples block to a small dedicated reference acquired only when emitting next steps, or trim to the few most representative examples.

📊 Gates Comparison

Gate Score Comparison
Decision Branching 4 ⬆️ Slightly better
Workflow Completeness 4 ⬆️ Slightly better
Reference Integrity 2 ⬇️ Slightly worse
Example Grounding 4 ⬆️ Slightly better
Bloat Control 3 ⬇️ Slightly worse

📄 instructions/r2/core/workflows/init-workspace-flow.md

⚠️ Issues Found

Severity Gate Details
🟡 High Output Contract Problem:
Phase 6 step 2 still lists Output as only 'CONTEXT.md, ARCHITECTURE.md, IMPLEMENTATION.md, ASSUMPTIONS.md, AGENT MEMORY.md' (5 files), but the documentation skill it invokes now also produces TODO.md, and the documentation phase file's checklist demands '7 doc files'. The top-level workflow's enumerated output is out of sync with the phase it orchestrates.
Reason:
Downstream phases and verification rely on the orchestrator's output list; an undercount can cause the new file to be missed or not propagated to state inventory.
Solution:
Add TODO.md (and README.md if counted) to the Phase 6 Output enumeration so the workflow's stated outputs match the phase and skill.

📊 Gates Comparison

Gate Score Comparison
Output Contract 3 ⬇️ Slightly worse
Workflow Completeness 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/architect.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/planner.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/agents/prompt-engineer.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/planning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/reasoning/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/skills/requirements-authoring/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 5 ⬆️ Slightly better
Structural Coherence 5 ⬆️ Slightly better

📄 instructions/r2/core/skills/research/SKILL.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Reference Integrity 4 ⬆️ Slightly better
Dependency Management 4 ⬆️ Slightly better

📄 instructions/r2/core/configure/cursor.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better
Reference Integrity 4 ⬆️ Slightly better

📄 instructions/r2/core/configure/github-copilot.md

⚠️ Issues Found

Severity Gate Details
🔵 Medium Bloat Control Problem:
The new line Claude Opus 4.6 - Anthropic Claude 4.6 Opus prev gen (4.7 existed but was not good) adds a provenance/history annotation. pa-hardening and pa-patterns both say to remove non-operational clarifications, history, and origin labels from target prompts; the parenthetical explains why a model is absent rather than describing a selectable model.
Reason:
A catalog entry should state what the model is, not narrate why an unlisted version was skipped; the note carries no operational value for model selection.
Solution:
Drop the (4.7 existed but was not good) parenthetical; keep only the operational prev gen label that the agent acts on.

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better

📄 instructions/r2/core/rules/bootstrap-rosetta-files.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Conflict Resolution 4 ⬆️ Slightly better
Precision & Explicitness 5 ⬆️ Slightly better
Reference Integrity 5 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/agent-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better

📄 instructions/r2/core/templates/shell-schemas/skill-shell.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison
Precision & Explicitness 4 ⬆️ Slightly better

📄 instructions/r2/core/workflows/adhoc-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

📄 instructions/r2/core/workflows/research-flow.md

✅ No Issues Found

📊 Gates Comparison

Gate Score Comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant