A public dashboard that ranks open-source repos by how friendly they are for AI coding agents — per model.
Next.js 16 + SQLite (better-sqlite3), styled with Tailwind CSS 4. Spans GitHub, GitLab, and Bitbucket out of the box. Current release: 0.5.0.
AI coding agents — Claude Code, Cursor, Devin, GPT-5 Codex, Gemini CLI, Aider, OpenHands, Pi — succeed dramatically more often on some repos than others. The difference is rarely the agent; it's the repo. A codebase with fast tests, a clear AGENTS.md, a Makefile, and CI is a massively different environment than one without.
Goal: a public leaderboard where anyone can look up a repo and see:
- How agent-friendly is it overall?
- How friendly is it for my agent? (Claude Code weights
AGENTS.mdheavily; Devin cares about CI + reproducible envs; Cursor prefers strong types + a good README.) - Why does it rank there, and what would it take to improve for my agent? — top-3 gaps ranked by score-gain.
Two audiences:
- Maintainers — a ranked checklist of what to fix to make their repo friendlier to agents.
- Agent users — when picking between forks, packages, or alternatives, agent-friendliness is a real dependency-choice signal.
| Project | What it does | What we do differently |
|---|---|---|
| Factory.ai Agent Readiness | Single-tenant scanner: you point it at your repo and get a score with auto-fix PRs. 8 pillars, 5 maturity levels. | Public + cross-forge + per-model. Factory rates your own repo in isolation — we rank across repos, on GitHub/GitLab/Bitbucket, and the ranking changes based on which agent you care about. |
kodustech/agent-readiness |
OSS alternative to Factory — static checks, local scan. | We're a public ranking service, not a local scanner. Scoring logic is similar in spirit; the product is the leaderboard and the per-model lens. |
jpequegn/agent-readiness-score |
Explicitly "inspired by Factory.ai" — OSS framework to measure codebase readiness. | Same delta as above — single-tenant vs. public + per-model. |
viktor-silakov/ai-ready |
39 checks, 7 pillars, 10+ languages. Scanner. | Same delta. |
ambient-code/agentready |
Assesses git repos against evidence-based attributes. | Same delta. |
| Cloudflare Agent Readiness | Rates websites for agent consumption. | Wrong object — we rate code repos. |
| Fern Agent Score | Public leaderboard rating documentation sites for AI-readiness. | Closest in shape — public + leaderboard — but scores docs, not code. |
| Clarvia | Scoring platform for MCP servers (Agent Experience Optimization). | Adjacent — rates tools, not repos. |
| SWE-Bench (Verified / Pro), GitTaskBench, FeatureBench, HAL, PR Arena | Rank agents on a fixed set of repos. | We want the transpose: rank repos per agent. Our measurement story (once the benchmark harness lands) looks a lot like these, with the axes flipped. |
| GitHub Trending, ossinsight | Popularity / activity rankings. | Stars ≠ agent-friendliness. |
Our differentiators, in one line: cross-forge, public, per-model, and explainable — every score decomposes to signals, and every repo page shows what to improve next for the selected model.
Not pretending the idea is free of risk:
- Per-model scoring is the hardest part and the easiest to fake. Per-model rationales are now sourced from each agent's published docs (see
MODELS[].sourcesinlib/scoring/weights.ts), but the weight values themselves are still pre-benchmark. Real "Claude ranks this higher than GPT-5" requires actually running each agent on each repo. That'stasks/1.0.0/03-benchmark-harness.md. - Factory.ai is already in this space. Differentiation has to stay sharp.
- Public-shaming risk. Ranking #47,823 without consent invites angry maintainers. Planned via
tasks/0.7.0/01-opt-out-claim-flow.md. - Score gaming. Once public, people add boilerplate
AGENTS.mdto pass the rubric without being useful. Dynamic (actually-run-an-agent) checks are the counter — see benchmark harness. - Freshness. Scores decay with every push. Webhook-driven rescoring is roadmap.
See /methodology in the running app for a candid walkthrough of what's measured today and what isn't.
Short answer: low risk. The app:
- Only reads files after a shallow clone; never executes anything from the cloned tree (no
npm install, no post-clone hooks). - Uses
--depth 1 --single-branchand never clones submodules. - Runs all SQL via prepared statements.
- Renders through React (auto-escaping); the only
dangerouslySetInnerHTMLuse is server-built JSON-LD with<escaped to<. - Has no auth and no writable API endpoints — read-only dashboard.
Operational concerns for a public launch (not code-level security):
- Disk quotas for the clone workspace.
- Rate limiting the public API.
- Sandbox the cloner in a container (future-proofing against hypothetical git CVEs).
Auth and per-maintainer controls land with the opt-out / claim flow in v0.7.0.
bun install
bun run prepare-hooks # once — installs lefthook pre-commit (Biome + tsc + test + file-length)
bun run seed # score the curated set across GH / GL / BB + cache popular package aliases
bun run dev # http://localhost:3000Score a single repo:
bun run score https://github.com/vercel/next.js
bun run score https://gitlab.com/gitlab-org/cli
bun run score https://bitbucket.org/snakeyaml/snakeyaml
bun run score /path/to/local/checkoutOptional: GITHUB_TOKEN / GITLAB_TOKEN in env to raise API rate limits.
Run the unit tests with bun run test (uses node --test + tsx; requires Node ≥20.9.0).
lib/version.ts and package.json carry the current release number (currently 0.5.0). Bumps happen only when we actually cut a release — never when merging intermediate work. The version pill in the header surfaces the number directly; /changelog lists what each release shipped.
| Choice | Why | When we'd revisit |
|---|---|---|
| Next.js 16 (App Router) | Future features (filters, charts, auth, diff views) are React's territory. File-based routing + API routes replace hand-rolled HTTP cleanly. | Unlikely. The core scorer is stack-agnostic, only app/ depends on this. |
Node runtime (with tsx for CLI scripts) |
Matches Vercel's serverless runtime — no Bun-only imports in prod. Bun still works locally as a fast package manager. | Unlikely — only if the deployment target changes. |
| Tailwind CSS 4 | Zero-config via @theme tokens, no tailwind.config.* needed. Tight bundle output. |
Would only leave for something with a stronger design-system story. |
better-sqlite3 |
Single file, inspectable, zero ops overhead. Node-native so Vercel's serverless runtime can load it directly. | Postgres when concurrent writers / access control arrive (tasks/1.0.0/01-postgres-migration.md). |
| Server-first; client islands where needed | Cheap, fast, SEO-friendly. Client components only where interactivity demands — mobile nav, search, selects, copy, back-to-top. | When client islands grow past presentational interactivity → reach for client state mgmt. |
Shallow git clones (--depth 1 --single-branch) |
Bandwidth + speed. Current signals don't need history. | If commit-history ever enters scoring → host APIs or --filter=blob:none partial clones. |
| Exact-pinned deps | Deterministic scoring across environments. | Never. |
| One file per signal | Each signal is a small, independent concern — keeps git log and code review focused. |
When we bundle signals into dynamic checks (then the unit becomes the bundle). |
- Static signals need to read file contents (AGENTS.md length,
pyproject.toml [tool.X]sections, package.json scripts count) — not just existence. - One clone is faster than N API calls for content-heavy scoring, and respects rate limits.
- Any real version of this dashboard needs dynamic signals (run tests, run an agent). Those absolutely need code on disk.
app/ Next.js App Router — pages + API + SEO
layout.tsx root layout, root metadata (OG + Twitter cards)
page.tsx leaderboard
repo/[id]/ repo detail (generateMetadata + per-repo OG image)
methodology/ how scoring works today
roadmap/ upcoming versions (from lib/roadmap.ts)
changelog/ what's shipped (from lib/changelog.ts)
about/ independent project, no vendor affiliation (footer-linked, E-E-A-T)
action/ PR-diff GitHub Action explainer + install snippet
skill/ agent-skill explainer + install command
package/ registry → repo lookup (form + per-package state pages)
api/ /repos, /repo/[id], /score, /badge/<host>/<owner>/<name>, /package/<registry>/<name>
robots.ts /robots.txt — allows "/", blocks "/api/"
sitemap.ts /sitemap.xml — static routes + every repo
llms.txt/ markdown manifest for LLM crawlers
globals.css Tailwind import + @theme tokens
components/ React components (Tailwind-styled)
lib/
scoring/ signals, weights, scorer — pure, no I/O outside the cloned tree
clients/ git clone, host API, npm/PyPI/Cargo registries
constants/ thresholds, host labels, sort keys
utils/ format + score-tier helpers, SVG badge renderer, package-request URL builder
db.ts better-sqlite3 schema + queries (all SQL lives here)
package-lookup.ts shared registry → repo lookup (used by /api/package + /package page)
version.ts app + sibling URLs, install snippets (ACTION_USES, SKILL_INSTALL_CMD), SIBLING_VERSION pin
changelog.ts / roadmap.ts / skill-content.ts
scripts/ CLI entries run via `tsx` (Node) — score, seed, init-db
tests/ `node --test` unit tests — scorer, signals, URL parser, formatters
tasks/ Per-version task breakdown (agent-readable)
public/ Static assets — demo/ screenshots used by the README + OG image
.claude/ settings.json, hooks/ (Stop guard), skills/
data/ rank.db (committed — shipped as a build artifact; rescoring runs locally)
AGENTS.md Agent instructions (source of truth)
CONTRIBUTING.md Human-contributor guide — PR workflow, review bar
CLAUDE.md Pointer → AGENTS.md
LICENSE MIT
hsnice16/agent-friendly-action runs the same scorer inside your CI and posts a per-PR score-delta comment — "this PR drops your Claude Code score by 4.1 points because it removed CI config." Opt-in via an AGENTS_BADGE_TOKEN secret; falls through silently when unset. Each repo detail page on the dashboard ships a copy-paste workflow snippet under "Catch score regressions on every PR".
hsnice16/agent-friendly-skill is a portable agent skill installable in one command — npx skills add hsnice16/agent-friendly-skill#v0 — that scores the user's current repo locally and recommends a model. Profiles the same 8 agents this dashboard does (Claude Code, Cursor, Devin, GPT-5 Codex, Gemini CLI, Aider, OpenHands, Pi); installs into any vercel-labs/skills-compatible host (Cline, Copilot, Continue, Roo Code, …) and produces identical output regardless of which host invokes it — scoring is score-driven, not host-driven. Same self-contained property as the action: vendored scorer, no service dependency, works offline. The dashboard's /skill page hosts the install command, the score → model mapping, and optional SessionStart hook snippets for Claude Code and Codex.
Read-only JSON endpoints for external integrators (skills, hooks, browser overlays, third-party tools):
GET /api/score?host=<host>&repo=<owner>/<name>— look up an indexed repo by host + owner/name. Returns{ repo, signals, modelScores }on 200;{ error: "not_indexed" }with status 404 when the repo isn't in our DB. The natural lookup endpoint for any tool that has a repo URL but not our internal id.GET /api/repos— full leaderboard (id, owner, name, host, stars, overall_score, per-model scores).GET /api/repo/<id>— per-repo detail (signals, model scores, top improvements). Requires the internal id; use/api/scorefirst if you only have host + owner/name.GET /api/badge/<host>/<owner>/<name>.svg— embeddable SVG badge.?model=<id>for per-model variants.GET /api/package/<registry>/<name>— resolve npm / PyPI / Cargo package → source-repo score (orunresolvedwhen the registry doesn't expose a repo URL).
Neither the action nor the agent skill calls these at runtime — both vendor the scorer and run locally. The endpoints exist so any third party can build on top of the dashboard without a network round-trip becoming a critical-path dependency.
See /roadmap in the running app or the per-version tasks/ folders for the full picture.
Versions are sequenced cheap-first so the highest-impact small additions don't get gated on heavy infra:
- 0.6.0 — auto-refresh + smarter matching: webhook-driven rescoring (keep scores fresh on every push) + alternatives via README embeddings (cross-language matches the v0.3.0 SQL heuristic misses).
- 0.7.0 — maintainer ownership + at-scale discovery: OAuth opt-out / claim flow for maintainers + at-scale package overlay (per-registry leaderboards + userscript that renders the badge inline on npmjs.com / PyPI / crates.io).
- 1.0.0 — production cut: Postgres migration for concurrent writers + auto-discovered crawl (target 10k repos) + benchmark harness that derives per-model weights from measured agent success. From here on, breaking API changes require a MAJOR bump.
The score isn't defensible. The evaluation harness + the cross-forge dataset + the maintainer network are. Open-source the harness, publish the weights, keep the data + dashboard + badge network as the product.
MIT — see LICENSE.
See CONTRIBUTING.md for setup, branch/commit style, the PR description template, and the changelog discipline.
If this project is useful to you, consider sponsoring its development: github.com/sponsors/hsnice16.

