Ship a /plan-agent:review-plan skill that spins up a Claude Code Agent Team — five core reviewers always, plus two UI-conditional reviewers (UX/design and accessibility) spawned only when the plan shows UI signals — to critique an HTML implementation plan in parallel, then has the lead synthesise their findings and apply the improvements directly back into the plan file. Turning every plan into a peer-reviewed, self-improving document, with UX and a11y coverage automatically scaled to whether the plan actually touches a user interface.
Read and implement all steps in the plan at docs/plans/add-plan-review-team-skill.html — Add an agent-team skill that reviews and updates implementation plans
Run as workflow — launch parallel subagents
Run a workflow to implement the plan at docs/plans/add-plan-review-team-skill.html — Add an agent-team skill that reviews and updates implementation plans
add-plan-review-team-skill.html
docs/plans/add-plan-review-team-skill.html
Context
The plan-agent plugin generates rich, self-contained HTML implementation plans (implementation-plan) and marks them complete (finalize-plan), but there is no automated way to critique and improve a plan before it is implemented. Today a plan's quality depends entirely on the single session that wrote it.
The Claude Code Agent Teams feature is purpose-built for exactly this: a lead session spawns independent teammates that each work in their own context window, share a task list, and message each other directly. The docs call out "research and review" as the strongest use case — multiple reviewers investigate different angles simultaneously and challenge each other's findings before converging.
The sibling plugin product-plans already proves the pattern with plan-review-agents: a six-role Agent Team that reviews product plans (Markdown) in place using reusable subagent definitions. This plan adapts that proven shape for technical implementation plans — five plan-specific reviewer lenses operating on the plugin's own HTML plan format, applying edits to HTML step cards and acceptance-criteria items rather than Markdown sections.
Because implementation plans are often backend, CLI, or infrastructure work, UX and accessibility lenses would be dead weight on most reviews — but invaluable when a plan describes a UI feature. So instead of permanent UX/a11y teammates, the lead spawns a UX/design reviewer and an accessibility reviewer only when the plan shows UI signals, reusing the exact heuristic the implementation-plan skill already applies in its interview UI override (references to React/Vue/Svelte, .tsx/.jsx/.css/.html, className/Tailwind, or UX terms like button, modal, form, dialog, page, component). Backend plans stay lean at five reviewers; UI plans automatically get all seven.
Agent Teams are experimental and disabled by default (require CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 and Claude Code ≥ 2.1.32), so the skill must hard-gate on availability and never silently fall back to single-session role-play.
Files to Modify
.claude-plugin/marketplace.jsonmodified bump plan-agent 1.8.0 → 1.9.0 + description- kit/plugins/plan-agent/
README.mdmodified document review-plan skillCHANGELOG.mdmodified add 1.9.0 entry- .claude-plugin/
plugin.jsonmodified mention review-plan in description- agents/ (new directory)
plan-reviewer-architecture.mdnew core: feasibility / ordering lensplan-reviewer-completeness.mdnew core: objective coverage / gaps lensplan-reviewer-testability.mdnew core: verify lines + Tests section lensplan-reviewer-risk.mdnew core: failure modes / regressions lensplan-reviewer-conventions.mdnew core: structure / clarity / conventions lensplan-reviewer-ux.mdnew conditional: UX / flows / states lensplan-reviewer-accessibility.mdnew conditional: WCAG / keyboard / ARIA lens- skills/review-plan/ (new directory)
SKILL.mdnew team orchestration workflow- references/
role-prompts.mdnew 7 spawn prompts (5 core + 2 UI-conditional)output-template.mdnew synthesis report + inline-edits table
Diagram
find HTML planverify Agent Teamsscan plan for UI signals5 core (+2 if UI)findings via mailboxoutput-template.mdEdit the HTML plan<stem>-review.htmlSteps
agents/ is brand-new to this plugin — create kit/plugins/plan-agent/agents/ and kit/plugins/plan-agent/skills/review-plan/references/ before authoring any files.Verify
ls kit/plugins/plan-agent/agents kit/plugins/plan-agent/skills/review-plan/references lists both directories without error.agents/*.md) lets it serve as both a delegated subagent and a team teammate — mirroring product-plans' six product-reviewer-*.md files. Each file needs frontmatter (name matching the filename, a third-person description, a restrictive tools allowlist of Read, Glob, Grep, Bash, and an explicit model) plus a body defining that lens's mandate for implementation plans. The five core lenses always run; the two UI lenses (ux, accessibility) are authored the same way but only spawned when UI signals are present (Step 6).Verify
plan-reviewer-architecture.md, -completeness.md, -testability.md, -risk.md, -conventions.md; conditional — plan-reviewer-ux.md, plan-reviewer-accessibility.md. For each, the name: frontmatter equals the filename stem and the description states what it reviews and when to use it. The two conditional descriptions note they apply to UI-bearing plans.<ABSOLUTE_PATH> placeholder for the resolved plan, tells the reviewer to read the HTML plan, apply its lens, and report findings back via SendMessage with severity ratings — keeping the heavy prose out of SKILL.md (progressive disclosure). Include all seven (five core + the two UI prompts); group the UX and accessibility prompts under a clearly-marked "conditional — UI plans only" heading so the lead knows they are spawned selectively.Verify
grep -c "<ABSOLUTE_PATH>" references/role-prompts.md returns 7, and the file contains one clearly-headed prompt block per reviewer lens, with the UX and accessibility blocks marked conditional.step-card #3 verify-body, #criteria-list append, objective-card) plus an action (edit / append / insert). This table is what drives the in-place update in Step 6.Verify
SKILL.md Step 6 reads.name: review-plan, a three-part description (≤200 chars), and allowed-tools: Read, Glob, Grep, Bash, Edit, AskUserQuestion, TodoWrite, ToolSearch, ExitPlanMode, SendUserFile. Body steps mirror plan-review-agents: (0) exit plan mode via ToolSearch select:ExitPlanMode then ExitPlanMode, treating "not in plan mode" as success; (1) resolve the HTML plan (--dir → plansDirectory → docs/plans/ → newest *.html excluding index.html); (2) default to update-in-place + artifact; (3) gate on Agent Teams; (3b) detect UI signals; (4) spawn the team (5 core, +2 UI reviewers when signals present); (5) collect; (6) synthesise; (7) integrate; (8) emit artifact; (9) clean up.Verify
head shows valid frontmatter with all listed allowed-tools (including both ToolSearch and ExitPlanMode); the body is under 500 lines and contains the Step 0 exit-plan-mode bootstrap and a numbered Step 1–9 workflow.<style>/<script>) for UI signals — references to React/Vue/Svelte/Angular, .tsx/.jsx/.css/.html file extensions, className/style/Tailwind, or UX terms (button, modal, form, dialog, dropdown, page, component) — reusing the implementation-plan interview UI-override heuristic. (b) Spawn step: a directive that creates the team and spawns the five core plan-reviewer-* subagent types in parallel, plus plan-reviewer-ux and plan-reviewer-accessibility when UI signals were detected; brief each with its role-prompts.md template (<ABSOLUTE_PATH> substituted via realpath), and wait for all spawned teammates before synthesis. Announce which mode ran (e.g. "UI signals detected — running 7 reviewers" vs "no UI signals — running 5 core reviewers"). (c) Integrate step: walk the synthesis "Inline Edits to Apply" table and apply one Edit per row against the HTML plan — targeting .step-card bodies, .verify-body, #criteria-list <li> items, .objective-card, and the Verification paragraph — HTML-escaping all inserted content, never touching <style> / <script>, and appending a collapsible <details> "Team Review (timestamp)" block before </main>.Verify
SKILL.md defines the UI-signal detection list, the spawn step names the five core plan-reviewer-* agent types plus the two conditional ones with their gating condition and "wait for all spawned", and the integrate step names the specific HTML selectors above, the HTML-escape rule, and the appended Team Review <details>.marketplace.md. In .claude-plugin/marketplace.json raise the plan-agent entry from 1.8.0 to 1.9.0 and extend its description to mention review-plan; update kit/plugins/plan-agent/.claude-plugin/plugin.json description to match. Set version only in marketplace.json, never in plugin.json.Verify
grep -A2 '"plan-agent"' .claude-plugin/marketplace.json shows "version": "1.9.0"; plugin.json has no version key; the project's settings hook reports marketplace.json as valid JSON.1.9.0 CHANGELOG entry, and a README section documenting /plan-agent:review-plan — the five reviewer roles, the Agent Teams requirement, the enable flag, and the update-in-place default.Verify
CHANGELOG.md has a 1.9.0 heading dated 2026-06-06; README.md lists review-plan with its invocation, roles, and the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS requirement./validate-plugin plan-agent (or the bundled validation skill) and confirm the new skill and all seven agents parse (five core + two UI-conditional), the homepage URL is correct, and marketplace.json registration is intact.Verify
validate-plugin reports no errors for plan-agent; node -e "JSON.parse(require('fs').readFileSync('.claude-plugin/marketplace.json'))" exits 0.Tests
Tier 1 — the plan creates skill and agent definition files plus a marketplace registration that change the plugin's runtime behavior. There is no unit-test runner for Markdown/JSON plugin definitions in this repo, so "tests" here are the plugin's real validation harness plus a manual team-run smoke test.
Smoke test — review-plan updates a plan in place. With Agent Teams enabled, run /plan-agent:review-plan tests/fixtures/sample-plan-backend.html against a copied non-UI fixture. Assert the run (1) spawns exactly five core teammates, (2) finishes with the fixture modified — at least one inline edit applied to a step card or criterion AND a "Team Review" <details> appended before </main>, and (3) writes a sibling sample-plan-backend-review.html artifact. Then run against tests/fixtures/sample-plan-ui.html (a fixture referencing a .tsx component) and assert seven teammates spawn (core + plan-reviewer-ux + plan-reviewer-accessibility). Finally, with the flag unset, re-run and assert the skill hard-stops with the enable message and makes no edits. This proves the objective: a team that reviews and updates an implementation plan, scaling UX/a11y coverage to the plan's content.
Fixtures: tests/fixtures/sample-plan-backend.html (no UI signals) and tests/fixtures/sample-plan-ui.html (UI signals). These fixture files must be created before the test can run — copy any existing docs/plans/*.html to those scratch paths (backend: no UI terms; UI: include at least one .tsx reference in the body, not inside a <style>/<script> block). Run manually — Agent Team runs are interactive and not yet CI-automatable.
Unit-level (structure validation)
- File: existing
validate-pluginskill / structural checks. - Targets:
review-plan/SKILL.mdfrontmatter and the sevenplan-reviewer-*.mdagent definitions (5 core + ux + accessibility). - Key cases: each
name:equals its filename stem; descriptions are third-person and non-empty;allowed-toolsincludes bothToolSearchandExitPlanMode; noversionkey inplugin.json.
Integration-level (registration)
- File:
.claude-plugin/marketplace.json+scripts/check-version-bump.sh. - Targets: plan-agent marketplace entry and version guard.
- Key cases:
marketplace.jsonparses as valid JSON; theplan-agentversion is1.9.0; the version-bump guard passes againstmain.
Acceptance Criteria
Verification
End-to-end, backend plan (teams enabled): set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1, restart Claude Code (≥ 2.1.32), copy a non-UI plan from docs/plans/*.html to a scratch path, and run /plan-agent:review-plan <scratch>.html. Confirm exactly five core teammates spawn (no UX/a11y), the lead synthesises, the scratch plan is edited in place (visible inline improvements + a "Team Review" collapsible at the bottom), a <stem>-review.html companion is written, and the team is cleaned up with no orphaned teammates.
End-to-end, UI plan: repeat with a plan that references UI (e.g. a .tsx component or terms like modal/form) and confirm seven teammates spawn — the five core plus plan-reviewer-ux and plan-reviewer-accessibility — and that the resulting inline edits include UX/accessibility improvements.
Negative path (teams disabled): unset the flag and re-run — the skill must stop immediately with the enable instructions and leave the plan file byte-identical.
Reusability: confirm each reviewer also works as a plain subagent by asking Claude to "use the plan-reviewer-testability agent to review <plan>.html" without a team.
Completion Checklist
Completion Report
No items to report — all requirements met.
Unresolved Questions
-
Should review-plan be model-invocable or command-only?
For the plan-agent review-plan skill in the agentics repo, recommend whether it should be model-invocable (auto-activates on "review my plan" intent like product-plans:plan-review-agents) or command-only with disable-model-invocation (like finalize-plan). Consider that it spawns a costly Agent Team and mutates the plan file. Weigh discoverability vs accidental expensive runs, and recommend one with reasoning plus the exact frontmatter to use.
-
How should the team handle very large or multi-plan reviews?
For the plan-agent review-plan Agent Team skill, recommend how to handle (a) a single very large HTML plan that may exceed a teammate's useful context and (b) a request to review several plans at once. Should the lead chunk a large plan by section across teammates, or keep one-plan-per-run? Recommend an approach and the guardrails to document in SKILL.md.
Team Review (2026-06-07)
Executive Summary
The plan is architecturally sound and the overall design is well-matched to the Agent Teams topology, but the team identified two high-severity issues requiring follow-up: (1) the plan is already shipped as v1.9.0 (commit 1352236), so the version bump is a no-op and this review is retrospective; (2) the hero smoke test references fixture files that were never created, leaving the primary verification path unrunnable. A recurring count drift (5 vs 7 reviewers) was present in the Files note, Diagram collect node, and Step 9 — all three have been corrected inline. The conventions reviewer confirmed the shipped agent files use allowed-tools: where the agent runtime expects tools: — a functional correctness issue to address in v1.10.0. UX and accessibility reviewers flagged the lack of progress feedback during the multi-agent run, the absence of an undo/preview gate before in-place mutation, and several contrast failures in the plan document's own CSS. A new acceptance criterion (AC12) for team cleanup completeness has been added.
Role-by-Role Findings
- Architecture
- Sound topology; key concerns: lossy synthesis data flow (prose → table → HTML edits with no schema), brittle CSS-selector mutation that silently breaks on skeleton drift, duplicated UI-signal heuristic, and no minimum-quorum gate for partial reviewer failures. Recommend a thin structured contract between reviewer and lead, a selector-sanity guard in Step 7, and an idempotency guard to prevent double-review corruption.
- Completeness
- Highly specific plan; gaps: hero test fixtures never created (high), internal 5/7 count drift now fixed (medium),
SendUserFilein allowed-tools unanchored, unresolved questions block concrete step decisions. - Testability
- Critical: hero smoke test cannot run without fixtures. High: UI-signal detection heuristic has no boundary-case tests; AC6/AC11 lack an assertable machine-readable signal. Recommend a deterministic mode-announcement line the test can grep.
- Risk
- Medium overall. Key risks: plan already shipped (version collision if re-run), self-mutating file with no backup or atomicity guarantee, HTML-injection risk from LLM-synthesised edits, Bash grant on reviewers is a latent write vector.
- Conventions
- Mostly compliant. Critical gap: shipped agent files use
allowed-tools:instead oftools:— agent runtime readstools:, so the restriction may be silently ignored. AC ID sequence has ac11 mid-list (cosmetic). Step 9 "five agents" typo now fixed inline. - UX
- No live progress feedback during multi-agent run (high); in-place mutation with no preview or undo (high); disabled-state error under-specified (high). Recommend task-list-as-progress surface, a consent gate before mutation, and verbatim disabled-state message in SKILL.md.
- Accessibility
- Plan document:
--subtle(#9ca3af, 2.54:1) and state colors (#16a34a 3.3:1, #d97706 3.19:1) fail WCAG AA contrast. Copy buttons lack aria-live feedback; touch targets below 44×44px. Feature:plan-reviewer-accessibilityagent lacks a concrete WCAG rubric — will produce shallow findings without one.
Confirmed Concerns (cross-reviewer agreement)
- Missing fixtures — Completeness + Testability + Risk all flagged independently. Create
tests/fixtures/sample-plan-backend.htmlandsample-plan-ui.htmlin a follow-up commit. - 5/7 reviewer count drift — Completeness + Conventions + Testability. Fixed inline in Files note, Diagram, and Step 9.
- Unresolved model-invocable question — UX + Risk both recommend command-only /
disable-model-invocationgiven cost and file mutation. Resolve in next iteration. - No partial-failure quorum — Architecture + UX. Add minimum-quorum rule to Step 5 in v1.10.0.
Highest-Risk Issues for v1.10.0
- Agent frontmatter key — Change
allowed-tools:totools:in all 7plan-reviewer-*.mdfiles. Functional correctness, not style. (Conventions) - Create test fixtures —
tests/fixtures/sample-plan-backend.html+sample-plan-ui.html. Hero test is unrunnable without them. (Completeness, Testability, Risk) - Add backup before mutation — Copy plan to
<stem>.bakor verify clean git state before first Edit call; treat any failed Edit as abort. (Risk) - Raise CSS contrast tokens — Darken
--subtleto ≥4.5:1 for text use; darken state-as-text colors. (Accessibility) - Resolve model-invocability — Recommend
disable-model-invocation; document in SKILL.md and README. (UX, Risk)
Reviewed by 7-member Agent Team (architecture, completeness, testability, risk, conventions, ux, accessibility) · 2026-06-07 · plan-agent v1.9.0