Plan: Add a plan-review Agent Team skill to plan-agent

Objective

Ship a /plan-agent:review-plan skill that spins up a Claude Code Agent Team — five core reviewers always, plus two UI-conditional reviewers (UX/design and accessibility) spawned only when the plan shows UI signals — to critique an HTML implementation plan in parallel, then has the lead synthesise their findings and apply the improvements directly back into the plan file. Turning every plan into a peer-reviewed, self-improving document, with UX and a11y coverage automatically scaled to whether the plan actually touches a user interface.

Implement

Read and implement all steps in the plan at docs/plans/add-plan-review-team-skill.html — Add an agent-team skill that reviews and updates implementation plans

Run as workflow — launch parallel subagents

Run a workflow to implement the plan at docs/plans/add-plan-review-team-skill.html — Add an agent-team skill that reviews and updates implementation plans

File add-plan-review-team-skill.html

Path docs/plans/add-plan-review-team-skill.html

Acceptance criteria 0 / 11 done

Context

The plan-agent plugin generates rich, self-contained HTML implementation plans (implementation-plan) and marks them complete (finalize-plan), but there is no automated way to critique and improve a plan before it is implemented. Today a plan's quality depends entirely on the single session that wrote it.

The Claude Code Agent Teams feature is purpose-built for exactly this: a lead session spawns independent teammates that each work in their own context window, share a task list, and message each other directly. The docs call out "research and review" as the strongest use case — multiple reviewers investigate different angles simultaneously and challenge each other's findings before converging.

The sibling plugin product-plans already proves the pattern with plan-review-agents: a six-role Agent Team that reviews product plans (Markdown) in place using reusable subagent definitions. This plan adapts that proven shape for technical implementation plans — five plan-specific reviewer lenses operating on the plugin's own HTML plan format, applying edits to HTML step cards and acceptance-criteria items rather than Markdown sections.

Because implementation plans are often backend, CLI, or infrastructure work, UX and accessibility lenses would be dead weight on most reviews — but invaluable when a plan describes a UI feature. So instead of permanent UX/a11y teammates, the lead spawns a UX/design reviewer and an accessibility reviewer only when the plan shows UI signals, reusing the exact heuristic the implementation-plan skill already applies in its interview UI override (references to React/Vue/Svelte, .tsx/.jsx/.css/.html, className/Tailwind, or UX terms like button, modal, form, dialog, page, component). Backend plans stay lean at five reviewers; UI plans automatically get all seven.

Agent Teams are experimental and disabled by default (require CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 and Claude Code ≥ 2.1.32), so the skill must hard-gate on availability and never silently fall back to single-session role-play.

Files to Modify

agentics/

.claude-plugin/marketplace.json modified bump plan-agent 1.8.0 → 1.9.0 + description
kit/plugins/plan-agent/

README.md modified document review-plan skill
CHANGELOG.md modified add 1.9.0 entry
.claude-plugin/

plugin.json modified mention review-plan in description

agents/ (new directory)

plan-reviewer-architecture.md new core: feasibility / ordering lens
plan-reviewer-completeness.md new core: objective coverage / gaps lens
plan-reviewer-testability.md new core: verify lines + Tests section lens
plan-reviewer-risk.md new core: failure modes / regressions lens
plan-reviewer-conventions.md new core: structure / clarity / conventions lens
plan-reviewer-ux.md new conditional: UX / flows / states lens
plan-reviewer-accessibility.md new conditional: WCAG / keyboard / ARIA lens

skills/review-plan/ (new directory)

SKILL.md new team orchestration workflow
references/

role-prompts.md new 7 spawn prompts (5 core + 2 UI-conditional)
output-template.md new synthesis report + inline-edits table

Diagram

review-plan team lifecycle

Resolve

find HTML plan

arg, settings, or newest in docs/plans/

Gate

verify Agent Teams

flag = 1 AND version ≥ 2.1.32 — else hard-stop

Detect

scan plan for UI signals

React/.tsx/.css/.html or UX terms → add UX + a11y reviewers

Spawn

5 core (+2 if UI)

architecture · completeness · testability · risk · conventions (+ ux · accessibility)

Debate & collect

findings via mailbox

teammates challenge each other; lead waits for all spawned teammates (5 core or 7 with UI)

Synthesise

output-template.md

agreements, conflicts, inline-edits table

Update in place

Edit the HTML plan

step cards, #criteria-list, append Team Review <details>

Emit & clean up

<stem>-review.html

shareable artifact, then lead cleans up the team

Steps

todo Scaffold the new directories under the plan-agent plugin

Write tools require parent paths to exist, and agents/ is brand-new to this plugin — create kit/plugins/plan-agent/agents/ and kit/plugins/plan-agent/skills/review-plan/references/ before authoring any files.

Verify

ls kit/plugins/plan-agent/agents kit/plugins/plan-agent/skills/review-plan/references lists both directories without error.

todo Author the seven reviewer subagent definitions (5 core + 2 UI-conditional)

Per the Agent Teams guide, defining each role as a reusable subagent (agents/*.md) lets it serve as both a delegated subagent and a team teammate — mirroring product-plans' six product-reviewer-*.md files. Each file needs frontmatter (name matching the filename, a third-person description, a restrictive tools allowlist of Read, Glob, Grep, Bash, and an explicit model) plus a body defining that lens's mandate for implementation plans. The five core lenses always run; the two UI lenses (ux, accessibility) are authored the same way but only spawned when UI signals are present (Step 6).

Verify

All seven files exist: core — plan-reviewer-architecture.md, -completeness.md, -testability.md, -risk.md, -conventions.md; conditional — plan-reviewer-ux.md, plan-reviewer-accessibility.md. For each, the name: frontmatter equals the filename stem and the description states what it reviews and when to use it. The two conditional descriptions note they apply to UI-bearing plans.

todo Write references/role-prompts.md with seven lens-specific spawn prompts

The lead briefs each teammate with a tailored spawn prompt carrying task-specific context (teammates do not inherit the lead's conversation). Each template uses a <ABSOLUTE_PATH> placeholder for the resolved plan, tells the reviewer to read the HTML plan, apply its lens, and report findings back via SendMessage with severity ratings — keeping the heavy prose out of SKILL.md (progressive disclosure). Include all seven (five core + the two UI prompts); group the UX and accessibility prompts under a clearly-marked "conditional — UI plans only" heading so the lead knows they are spawned selectively.

Verify

grep -c "<ABSOLUTE_PATH>" references/role-prompts.md returns 7, and the file contains one clearly-headed prompt block per reviewer lens, with the UX and accessibility blocks marked conditional.

todo Write references/output-template.md (synthesis report + inline-edits table)

A fixed synthesis template makes the lead's output deterministic. It must include an Executive Summary, Role-by-Role findings, an Agreements/Conflicts section, a Highest-Risk list, and — critically — an Inline Edits to Apply table whose rows map each accepted improvement to a concrete HTML target (e.g. step-card #3 verify-body, #criteria-list append, objective-card) plus an action (edit / append / insert). This table is what drives the in-place update in Step 6.

Verify

The file contains an "Inline Edits to Apply" table with columns for target HTML element, action, and new content; section headings match what SKILL.md Step 6 reads.

todo Write skills/review-plan/SKILL.md — the team orchestration workflow

This is the core. Frontmatter: name: review-plan, a three-part description (≤200 chars), and allowed-tools: Read, Glob, Grep, Bash, Edit, AskUserQuestion, TodoWrite, ToolSearch, ExitPlanMode, SendUserFile. Body steps mirror plan-review-agents: (0) exit plan mode via ToolSearch select:ExitPlanMode then ExitPlanMode, treating "not in plan mode" as success; (1) resolve the HTML plan (--dir → plansDirectory → docs/plans/ → newest *.html excluding index.html); (2) default to update-in-place + artifact; (3) gate on Agent Teams; (3b) detect UI signals; (4) spawn the team (5 core, +2 UI reviewers when signals present); (5) collect; (6) synthesise; (7) integrate; (8) emit artifact; (9) clean up.

Verify

head shows valid frontmatter with all listed allowed-tools (including both ToolSearch and ExitPlanMode); the body is under 500 lines and contains the Step 0 exit-plan-mode bootstrap and a numbered Step 1–9 workflow.

todo Specify the HTML-aware spawn and integrate logic inside SKILL.md

The key adaptation from the Markdown-based sibling. (a) UI-signal detection: before spawning, scan the resolved plan's HTML (excluding <style>/<script>) for UI signals — references to React/Vue/Svelte/Angular, .tsx/.jsx/.css/.html file extensions, className/style/Tailwind, or UX terms (button, modal, form, dialog, dropdown, page, component) — reusing the implementation-plan interview UI-override heuristic. (b) Spawn step: a directive that creates the team and spawns the five core plan-reviewer-* subagent types in parallel, plus plan-reviewer-ux and plan-reviewer-accessibility when UI signals were detected; brief each with its role-prompts.md template (<ABSOLUTE_PATH> substituted via realpath), and wait for all spawned teammates before synthesis. Announce which mode ran (e.g. "UI signals detected — running 7 reviewers" vs "no UI signals — running 5 core reviewers"). (c) Integrate step: walk the synthesis "Inline Edits to Apply" table and apply one Edit per row against the HTML plan — targeting .step-card bodies, .verify-body, #criteria-list <li> items, .objective-card, and the Verification paragraph — HTML-escaping all inserted content, never touching <style> / <script>, and appending a collapsible <details> "Team Review (timestamp)" block before </main>.

Verify

SKILL.md defines the UI-signal detection list, the spawn step names the five core plan-reviewer-* agent types plus the two conditional ones with their gating condition and "wait for all spawned", and the integrate step names the specific HTML selectors above, the HTML-escape rule, and the appended Team Review <details>.

todo Register the skill: bump version and update plugin descriptions

A new skill is a MINOR bump per marketplace.md. In .claude-plugin/marketplace.json raise the plan-agent entry from 1.8.0 to 1.9.0 and extend its description to mention review-plan; update kit/plugins/plan-agent/.claude-plugin/plugin.json description to match. Set version only in marketplace.json, never in plugin.json.

Verify

grep -A2 '"plan-agent"' .claude-plugin/marketplace.json shows "version": "1.9.0"; plugin.json has no version key; the project's settings hook reports marketplace.json as valid JSON.

todo Document the new skill in CHANGELOG.md and README.md

Project convention requires a CHANGELOG entry and README update for any new component. Add a dated 1.9.0 CHANGELOG entry, and a README section documenting /plan-agent:review-plan — the five reviewer roles, the Agent Teams requirement, the enable flag, and the update-in-place default.

Verify

CHANGELOG.md has a 1.9.0 heading dated 2026-06-06; README.md lists review-plan with its invocation, roles, and the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS requirement.

todo Validate the plugin structure

Catch frontmatter, JSON, and registration errors before commit. Run /validate-plugin plan-agent (or the bundled validation skill) and confirm the new skill and all seven agents parse (five core + two UI-conditional), the homepage URL is correct, and marketplace.json registration is intact.

Verify

validate-plugin reports no errors for plan-agent; node -e "JSON.parse(require('fs').readFileSync('.claude-plugin/marketplace.json'))" exits 0.

Tests

Tier 1 — the plan creates skill and agent definition files plus a marketplace registration that change the plugin's runtime behavior. There is no unit-test runner for Markdown/JSON plugin definitions in this repo, so "tests" here are the plugin's real validation harness plus a manual team-run smoke test.

Objective-Verification Test (hero)

Smoke test — review-plan updates a plan in place. With Agent Teams enabled, run /plan-agent:review-plan tests/fixtures/sample-plan-backend.html against a copied non-UI fixture. Assert the run (1) spawns exactly five core teammates, (2) finishes with the fixture modified — at least one inline edit applied to a step card or criterion AND a "Team Review" <details> appended before </main>, and (3) writes a sibling sample-plan-backend-review.html artifact. Then run against tests/fixtures/sample-plan-ui.html (a fixture referencing a .tsx component) and assert seven teammates spawn (core + plan-reviewer-ux + plan-reviewer-accessibility). Finally, with the flag unset, re-run and assert the skill hard-stops with the enable message and makes no edits. This proves the objective: a team that reviews and updates an implementation plan, scaling UX/a11y coverage to the plan's content.

Fixtures: tests/fixtures/sample-plan-backend.html (no UI signals) and tests/fixtures/sample-plan-ui.html (UI signals). These fixture files must be created before the test can run — copy any existing docs/plans/*.html to those scratch paths (backend: no UI terms; UI: include at least one .tsx reference in the body, not inside a <style>/<script> block). Run manually — Agent Team runs are interactive and not yet CI-automatable.

Unit-level (structure validation)

File: existing validate-plugin skill / structural checks.
Targets: review-plan/SKILL.md frontmatter and the seven plan-reviewer-*.md agent definitions (5 core + ux + accessibility).
Key cases: each name: equals its filename stem; descriptions are third-person and non-empty; allowed-tools includes both ToolSearch and ExitPlanMode; no version key in plugin.json.

Integration-level (registration)

File: .claude-plugin/marketplace.json + scripts/check-version-bump.sh.
Targets: plan-agent marketplace entry and version guard.
Key cases: marketplace.json parses as valid JSON; the plan-agent version is 1.9.0; the version-bump guard passes against main.

Verification

End-to-end, backend plan (teams enabled): set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1, restart Claude Code (≥ 2.1.32), copy a non-UI plan from docs/plans/*.html to a scratch path, and run /plan-agent:review-plan <scratch>.html. Confirm exactly five core teammates spawn (no UX/a11y), the lead synthesises, the scratch plan is edited in place (visible inline improvements + a "Team Review" collapsible at the bottom), a <stem>-review.html companion is written, and the team is cleaned up with no orphaned teammates.

End-to-end, UI plan: repeat with a plan that references UI (e.g. a .tsx component or terms like modal/form) and confirm seven teammates spawn — the five core plus plan-reviewer-ux and plan-reviewer-accessibility — and that the resulting inline edits include UX/accessibility improvements.

Negative path (teams disabled): unset the flag and re-run — the skill must stop immediately with the enable instructions and leave the plan file byte-identical.

Reusability: confirm each reviewer also works as a plain subagent by asking Claude to "use the plan-reviewer-testability agent to review <plan>.html" without a team.

Completion Checklist

Required

All step TODOs marked as done
All acceptance criteria verified and checked off
Plan status updated to completed

Completion Report

No items to report — all requirements met.

Next Steps

Add a background-mode variant of review-plan

Paste this prompt into Claude to execute this follow-up:

In the agentics repo, add a `--background` mode to the plan-agent `review-plan` skill, mirroring how product-plans ships `product-plans-bg`. In background mode it must require an explicit plan path argument, skip all AskUserQuestion prompts, default to update-in-place + artifact, and be safe to run unattended. Update the SKILL.md, add a background command wrapper if the plugin uses one, bump the plan-agent version in .claude-plugin/marketplace.json, and update the CHANGELOG and README.

Chain review-plan before finalize-plan

Paste this prompt into Claude to execute this follow-up:

In the agentics repo plan-agent plugin, add an optional handoff so that when a user finalizes a plan, finalize-plan can first offer to run review-plan (the Agent Team review) when CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is enabled. Use AskUserQuestion to offer "review then finalize" vs "finalize only". Do not make it mandatory. Update finalize-plan/SKILL.md, the README, CHANGELOG, and bump the version in marketplace.json.

Wish List

"Reviewed" badge + filter in the plans gallery Wish List

Speculative / blue-sky idea — not on the critical path. Paste into Claude when ready to explore:

Explore extending the plan-agent plans-library gallery so plans that have been reviewed by review-plan (detectable via the appended "Team Review" details block or a new  tag) get a "Reviewed" badge and a filter chip in index.html. Propose how to mark a plan as reviewed during the review-plan run, then how plans-library should detect and surface it. Return a short design before implementing.

Unresolved Questions

Should review-plan be model-invocable or command-only?

For the plan-agent review-plan skill in the agentics repo, recommend whether it should be model-invocable (auto-activates on "review my plan" intent like product-plans:plan-review-agents) or command-only with disable-model-invocation (like finalize-plan). Consider that it spawns a costly Agent Team and mutates the plan file. Weigh discoverability vs accidental expensive runs, and recommend one with reasoning plus the exact frontmatter to use.

How should the team handle very large or multi-plan reviews?

For the plan-agent review-plan Agent Team skill, recommend how to handle (a) a single very large HTML plan that may exceed a teammate's useful context and (b) a request to review several plans at once. Should the lead chunk a large plan by section across teammates, or keep one-plan-per-run? Recommend an approach and the guardrails to document in SKILL.md.

Team Review (2026-06-07)

Executive Summary

The plan is architecturally sound and the overall design is well-matched to the Agent Teams topology, but the team identified two high-severity issues requiring follow-up: (1) the plan is already shipped as v1.9.0 (commit 1352236), so the version bump is a no-op and this review is retrospective; (2) the hero smoke test references fixture files that were never created, leaving the primary verification path unrunnable. A recurring count drift (5 vs 7 reviewers) was present in the Files note, Diagram collect node, and Step 9 — all three have been corrected inline. The conventions reviewer confirmed the shipped agent files use allowed-tools: where the agent runtime expects tools: — a functional correctness issue to address in v1.10.0. UX and accessibility reviewers flagged the lack of progress feedback during the multi-agent run, the absence of an undo/preview gate before in-place mutation, and several contrast failures in the plan document's own CSS. A new acceptance criterion (AC12) for team cleanup completeness has been added.

Role-by-Role Findings

Architecture: Sound topology; key concerns: lossy synthesis data flow (prose → table → HTML edits with no schema), brittle CSS-selector mutation that silently breaks on skeleton drift, duplicated UI-signal heuristic, and no minimum-quorum gate for partial reviewer failures. Recommend a thin structured contract between reviewer and lead, a selector-sanity guard in Step 7, and an idempotency guard to prevent double-review corruption.
Completeness: Highly specific plan; gaps: hero test fixtures never created (high), internal 5/7 count drift now fixed (medium), SendUserFile in allowed-tools unanchored, unresolved questions block concrete step decisions.
Testability: Critical: hero smoke test cannot run without fixtures. High: UI-signal detection heuristic has no boundary-case tests; AC6/AC11 lack an assertable machine-readable signal. Recommend a deterministic mode-announcement line the test can grep.
Risk: Medium overall. Key risks: plan already shipped (version collision if re-run), self-mutating file with no backup or atomicity guarantee, HTML-injection risk from LLM-synthesised edits, Bash grant on reviewers is a latent write vector.
Conventions: Mostly compliant. Critical gap: shipped agent files use allowed-tools: instead of tools: — agent runtime reads tools:, so the restriction may be silently ignored. AC ID sequence has ac11 mid-list (cosmetic). Step 9 "five agents" typo now fixed inline.
UX: No live progress feedback during multi-agent run (high); in-place mutation with no preview or undo (high); disabled-state error under-specified (high). Recommend task-list-as-progress surface, a consent gate before mutation, and verbatim disabled-state message in SKILL.md.
Accessibility: Plan document: --subtle (#9ca3af, 2.54:1) and state colors (#16a34a 3.3:1, #d97706 3.19:1) fail WCAG AA contrast. Copy buttons lack aria-live feedback; touch targets below 44×44px. Feature: plan-reviewer-accessibility agent lacks a concrete WCAG rubric — will produce shallow findings without one.

Confirmed Concerns (cross-reviewer agreement)

Missing fixtures — Completeness + Testability + Risk all flagged independently. Create tests/fixtures/sample-plan-backend.html and sample-plan-ui.html in a follow-up commit.
5/7 reviewer count drift — Completeness + Conventions + Testability. Fixed inline in Files note, Diagram, and Step 9.
Unresolved model-invocable question — UX + Risk both recommend command-only / disable-model-invocation given cost and file mutation. Resolve in next iteration.
No partial-failure quorum — Architecture + UX. Add minimum-quorum rule to Step 5 in v1.10.0.

Highest-Risk Issues for v1.10.0

Agent frontmatter key — Change allowed-tools: to tools: in all 7 plan-reviewer-*.md files. Functional correctness, not style. (Conventions)
Create test fixtures — tests/fixtures/sample-plan-backend.html + sample-plan-ui.html. Hero test is unrunnable without them. (Completeness, Testability, Risk)
Add backup before mutation — Copy plan to <stem>.bak or verify clean git state before first Edit call; treat any failed Edit as abort. (Risk)
Raise CSS contrast tokens — Darken --subtle to ≥4.5:1 for text use; darken state-as-text colors. (Accessibility)
Resolve model-invocability — Recommend disable-model-invocation; document in SKILL.md and README. (UX, Risk)

Reviewed by 7-member Agent Team (architecture, completeness, testability, risk, conventions, ux, accessibility) · 2026-06-07 · plan-agent v1.9.0

Context

Files to Modify

Diagram

Steps

Tests

Unit-level (structure validation)

Integration-level (registration)

Acceptance Criteria

Verification

Completion Checklist

Completion Report

Next Steps

Executive Summary

Role-by-Role Findings

Confirmed Concerns (cross-reviewer agreement)

Highest-Risk Issues for v1.10.0