Ship a Research Agent Team phase (Step 0c) that spawns 4–6 specialized researcher teammates to investigate the plan objective from multiple angles in parallel, synthesizes findings into a Research Brief that feeds Steps 1–2, and gracefully degrades to single-session Explore when Agent Teams are unavailable
Read and implement all steps in the plan at docs/plans/add-research-agent-team.html — Ship a Research Agent Team phase for the implementation-plan skill
Run as workflow — launch parallel subagents
Run a workflow to implement the plan at docs/plans/add-research-agent-team.html — Ship a Research Agent Team for implementation-plan
add-research-agent-team.html
docs/plans/add-research-agent-team.html
Context
The implementation-plan skill currently runs a single-session sequential pipeline — one Claude instance explores the codebase (Step 0b), clarifies requirements (Step 1), and writes the plan (Step 2). For complex plans spanning multiple subsystems, research breadth is limited by what one context window can hold, and there is no adversarial tension: the same agent that researches also writes the plan.
The review-plan skill already demonstrates a proven Agent Team pattern in this plugin: it spawns 5–7 parallel reviewers with structured output, collects findings via SendMessage, and synthesizes results. The same pattern can be applied earlier in the pipeline — during research, before plan creation — to produce richer, more thoroughly investigated plans. Agent Teams (experimental, v2.1.32+) are the right tool because researchers need to communicate with each other: a codebase analyst discovering an API constraint can message the dependency researcher directly to adjust the investigation.
When Agent Teams are unavailable (flag unset, old version), the skill must degrade gracefully to the existing single-session Explore — not hard-stop like review-plan does. Plan creation is too critical to block on an experimental feature.
Files to Modify
- kit/plugins/plan-agent/agents/
plan-researcher-codebase.mdnew codebase architecture researcher agentplan-researcher-dependencies.mdnew library and API dependency researcherplan-researcher-devils-advocate.mdnew adversarial assumption challenger (conditional)plan-researcher-prior-art.mdnew web best-practices researcherplan-researcher-test-strategy.mdnew test infrastructure analystplan-researcher-ux.mdnew UI pattern researcher (conditional)- kit/plugins/plan-agent/skills/implementation-plan/references/
research-prompts.mdnew role-prompt templates for 6 researchersresearch-brief-template.mdnew synthesis template for Research Brief- kit/plugins/plan-agent/skills/implementation-plan/
SKILL.mdmodified add Step 0c, --research flag, allowed-tools- kit/plugins/plan-agent/
CHANGELOG.mdmodified add version entry for research teamREADME.mdmodified document research team featurekit/plugins/plan-agent/.claude-plugin/plugin.jsonmodified mention research team in description.claude-plugin/marketplace.jsonmodified bump plan-agent minor version
Diagram
ExploreResearch Agent TeamClarifyCreate- Researchers message each other directly
- Adversarial tension between teammates
- Shared task list for coordination
- Proven pattern in review-plan skill
- Simpler, no flag needed
- Lower token overhead
- No inter-agent communication
- Each reports in isolation
- One context window limit
- No adversarial challenge
- Sequential research only
- Already in place as fallback
| Role | Focus | Tools | When |
|---|---|---|---|
| codebase-analyst | Map architecture, find patterns, identify integration points | Read, Glob, Grep, Bash | Always |
| dependency-researcher | Libraries, APIs, external services, version compatibility | Read, Glob, Grep, Bash, WebSearch, WebFetch | Always |
| prior-art-researcher | Best practices, known pitfalls, reference implementations | WebSearch, WebFetch, Read | Always |
| test-strategist | Test infrastructure, coverage gaps, recommended approach | Read, Glob, Grep, Bash | Always |
| ux-researcher | UI patterns, accessibility requirements, responsive design | Read, Glob, Grep, WebSearch, WebFetch | UI signals |
| devils-advocate | Challenge assumptions, find failure modes, identify risks | Read, Glob, Grep, Bash | Complex plans |
Steps
references/research-prompts.md with role-prompt templates for all 6 researcher types
review-plan/references/role-prompts.md and keeps the main SKILL.md focused on orchestration logic. Each prompt needs <OBJECTIVE> and <CODEBASE_CONTEXT> placeholders plus a structured [Role Report] output format with SendMessage. Panel recommendation: the agent definition files (Step 3–4) should carry the full role prompt, making this reference file the canonical template source that agents are generated from — not a parallel source of truth. The orchestrator reads prompts from agent definitions at spawn time, matching the review-plan pattern exactly.Verify
kit/plugins/plan-agent/skills/implementation-plan/references/ directory exists (create it if not). Then read the file; confirm 6 distinct role sections (codebase-analyst, dependency-researcher, prior-art-researcher, test-strategist, ux-researcher, devils-advocate) with <OBJECTIVE> and <CODEBASE_CONTEXT> placeholders and SendMessage reporting format in each.references/research-brief-template.md with the synthesis template
Verify
plan-researcher-codebase.md, plan-researcher-dependencies.md, plan-researcher-prior-art.md, plan-researcher-test-strategy.md
plan-reviewer-*.md pattern: frontmatter with name, description, allowed-tools, model: sonnet; body with mandate, how-to-research instructions, and SendMessage reporting format. Codebase-analyst gets Read, Glob, Grep, Bash; dependency-researcher gets Read, Glob, Grep, Bash, WebSearch, WebFetch; prior-art gets WebSearch, WebFetch, Read; test-strategist gets Read, Glob, Grep, Bash.Verify
kit/plugins/plan-agent/agents/ directory exists (create it if not — existing plan-reviewer-*.md agent files already live here, so the directory should exist, but verify). Then read each of the 4 files; confirm valid frontmatter with model: sonnet and the correct allowed-tools per role. Confirm each has a mandate section and a SendMessage output format block.plan-researcher-ux.md and plan-researcher-devils-advocate.md
review-plan pattern where UX and accessibility reviewers are conditional on UI signals. UX researcher gets Read, Glob, Grep, WebSearch, WebFetch; devil’s advocate gets Read, Glob, Grep, Bash.Verify
--research and --no-research flags to SKILL.md argument parsing
--research forces the Research Agent Team phase. --no-research skips it. --quick shorthand expands to include --no-research. Panel recommendation: document the conflict resolution rule explicitly in the SKILL.md flags section: when both --research and --no-research are present, last-wins (consistent with standard CLI flag conventions). This must be visible in the flags documentation, not just tested.Verify
--quick description now includes --no-research in its shorthand expansion.--research is set or auto-detect complexity — concrete heuristic: spawn when any of: objective mentions 3+ distinct subsystem keywords (e.g. frontend+backend+database, or auth+API+UI), file-tree from Step 0b spans 3+ top-level directories, or objective contains cross-layer verbs (migrate, integrate, bridge, sync); default to skip otherwise (conservative, opt-in bias); (b) check Agent Teams availability (version >= 2.1.32 + CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS); (c) if unavailable and --research was explicitly passed, emit a visible warning (not just a log note): “Research Agent Team requested but Agent Teams unavailable — falling back to single-session Explore”; if auto-detected, log silently and fall through; (d) detect UI signals (same heuristic as review-plan Step 3b); (e) read agent definitions from agents/plan-researcher-*.md, substitute placeholders; (f) spawn 4 core + up to 2 conditional researchers; (g) wait for all via SendMessage — partial-failure rule: if fewer than 3 of 4 core researchers report within the turn limit, proceed with available findings and note missing roles in the Research Brief (e.g. “dependency-researcher: no findings received”); (h) read references/research-brief-template.md and synthesize; (i) inject Research Brief into context for Steps 1–2. Progress reporting: as each researcher completes, emit a status line (e.g. “Research: codebase-analyst done (2/4 complete)”).Verify
--no-research and --quick skip the step entirely.<details open> block (open by default for discoverability) titled “Research Brief” inside the Context section; omit entirely when no brief was generated. Panel recommendation: default to open so users don’t miss the research findings; they can collapse it after reading.Verify
<details> pattern and the omit-when-empty rule.allowed-tools frontmatter to include Agent Team tools
allowed-tools, the skill will trigger permission prompts when orchestrating the research team. Panel recommendation: before adding tool names, verify the exact spellings against the current Agent Teams API (check the review-plan skill’s allowed-tools as the authoritative reference). The plan assumes TeamCreate and TeamDelete but these must be confirmed — wrong names cause silent permission failures, not clear errors.Verify
allowed-tools line; confirm it includes SendMessage, TeamCreate, and TeamDelete.plugin.json description to mention the Research Agent Team capability
Verify
.claude-plugin/plugin.json; confirm the description mentions “Research Agent Team” or “research team”.README.md with Research Agent Team documentation
--research) or skip (--no-research) it, and the graceful degradation behavior.Verify
README.md; confirm a new section documents the research team feature with flag documentation and degradation behavior.CHANGELOG.md
Verify
CHANGELOG.md; confirm the new version entry exists and describes the Research Agent Team feature under an “Added” heading.plan-agent minor version in .claude-plugin/marketplace.json
marketplace.json for every plugin change. New feature = minor bump.Verify
.claude-plugin/marketplace.json; confirm the plan-agent version is higher than the current value on main and follows semver minor-bump convention.Tests
File: tests/plan-agent/research-team-smoke.test.ts
Type: Smoke test
Asserts: Invoking /plan-agent:implementation-plan --research on a multi-subsystem objective spawns at least 4 researcher teammates, collects structured findings via SendMessage, and produces a Research Brief block in the plan’s Context section. Also verifies that --no-research skips the team entirely and produces no Research Brief.
Run: claude --plugin-dir kit/plugins/plan-agent -p "/plan-agent:implementation-plan add auth middleware --research --no-clarify --no-align --no-interview"
File: tests/plan-agent/research-flags.test.ts
Targets: SKILL.md argument parsing logic for new flags
Key cases: --research sets research mode; --no-research disables it; --quick implies --no-research; both flags absent triggers auto-detection; conflicting --research --no-research uses last-wins
File: tests/plan-agent/researcher-agents-valid.test.ts
Targets: All 6 plan-researcher-*.md agent definition files
Key cases: Each file has valid YAML frontmatter with required fields (name, description, allowed-tools, model); model is sonnet; allowed-tools matches the role’s expected tool set; body contains SendMessage reporting instructions
File: tests/plan-agent/research-team-degradation.test.ts
Targets: Step 0c Agent Teams availability check and fallback path
Key cases: When CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is unset, Step 0c logs a note and falls through to Step 1 (not hard-stop); when Claude Code version < 2.1.32, same behavior; plan is still produced successfully without the Research Brief
File: tests/plan-agent/research-brief-injection.test.ts
Targets: Step 2 (Create) Research Brief handling
Key cases: When Research Brief exists, a collapsible <details> block appears in Context; when no brief was generated, the block is absent; the brief content is HTML-escaped
Acceptance Criteria
Verification
- Full-feature path: Set
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1, run/plan-agent:implementation-plan "Add WebSocket real-time notifications across frontend, backend, and database layers" --research --no-clarify --no-align --no-interview. Confirm: Step 0c spawns at least 4 researchers, collects findings, produces a Research Brief in the plan’s Context section as a collapsible<details>block. The plan itself is well-formed HTML with all required sections. - Graceful degradation path: Unset
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS, run the same command with--research. Confirm: Step 0c logs “Agent Teams unavailable — falling back to single-session Explore” (or similar) and proceeds to Step 1 without error. The plan is produced successfully without a Research Brief block. - Skip path: Run
/plan-agent:implementation-plan "Fix typo in README" --no-research --quick. Confirm: Step 0c is skipped entirely. No Agent Team is created. Plan is produced as normal. - Conditional researchers: Run a plan with UI signals (e.g. “Add React dashboard with charts and filters”) using
--research. Confirm: UX researcher is spawned (6 total researchers). Run a complex non-UI plan. Confirm: devil’s advocate is spawned but UX researcher is not (5 total). - Agent definitions: Run
ls kit/plugins/plan-agent/agents/plan-researcher-*.md | wc -land confirm output is 6. Grep each formodel: sonnetandSendMessage.
Completion Checklist
Completion Report
No items to report — all requirements met.
Unresolved Questions
-
Should complexity auto-detection default to spawning the team or skipping it?
The implementation-plan skill needs a complexity heuristic to decide when to auto-spawn the Research Agent Team (when neither --research nor --no-research is set). Two options: (A) conservative default — only spawn when complexity clearly exceeds a threshold (3+ subsystems, cross-layer changes), meaning most plans skip the research phase unless explicitly requested; (B) aggressive default — spawn for any plan that touches 2+ files, meaning most non-trivial plans get research. Consider: Agent Teams are expensive (5-10x tokens), experimental (flag required), and the single-session Explore is already decent for simple plans. Recommend one approach with reasoning, and draft the specific heuristic rules for the chosen approach.
-
Should researchers use the existing agent definitions or spawn as plain teammates?
Agent Teams can spawn teammates using subagent definitions (AGENT.md files) via the agentType option, which gives each teammate a scoped tool allowlist and system prompt. Alternatively, teammates can be spawned as plain sessions with the role prompt passed directly in the spawn instruction. The review-plan skill uses agent definitions (plan-reviewer-*.md). Compare: (A) using AGENT.md files (consistent with review-plan, tool scoping, reusable as standalone subagents) vs (B) plain spawn prompts (simpler, fewer files, prompts live in research-prompts.md only). Given that this plan already creates 6 agent definition files AND a research-prompts.md, is that redundant? Should the agent definitions contain the full prompt and research-prompts.md be eliminated, or should agent definitions be minimal (tools + model only) with research-prompts.md carrying the role instructions? Recommend one approach.
Team Review (2026-06-08 13:51:00 UTC)
Executive Summary
The plan is sound with revisions. All 7 reviewers agree the design follows the proven review-plan Agent Team pattern and inserts cleanly into the existing pipeline as Step 0c with proper graceful degradation. However, the team identified three high-priority issues that should be resolved before implementation: (1) the dual-artifact redundancy between agent definition files and research-prompts.md — flagged by 5 of 7 reviewers — needs a single source of truth; (2) the complexity auto-detection heuristic is underspecified, leaving the most common invocation path (no flags) with an undefined branch; and (3) the test infrastructure is incomplete — test files are declared but no runner, framework, or directory creation steps exist.
Role-by-Role Findings
Architecture Review
Fit: Follows the established Agent Team pattern with well-scoped component boundaries and proper graceful degradation.
- Dual-artifact redundancy between agent definitions and
research-prompts.md(medium) — recommends agent definitions as single source of truth, matchingreview-plan - Context injection is implicit — Research Brief handoff between 0c and Step 1 has no defined serialization boundary (medium)
- Auto-detection heuristic underspecified — no programmatic detection logic defined (medium)
- Unverified tool names
TeamCreate/TeamDelete(low)
Completeness Review
Completeness: Well-structured with 12 steps covering all named files, but several gaps could cause implementation friction.
- No directory creation steps for
agents/andreferences/before writing files (medium) - No step to read existing SKILL.md before Steps 5–8 edits (medium)
- Auto-detection heuristic buried in Unresolved Question, not in Step 6 (high)
- Test files declared but no creation steps — tests appear in Tests section but not in Steps (high)
- Unresolved Questions affect implementation — two open design decisions directly impact Step 6 (high)
- No test runner specified for
.tstest files (medium)
Testability Review
Test coverage: Adequate for Tier 1 with objective-verification, unit, and integration tests, but several critical paths lack coverage.
- No test for auto-detection heuristic (high)
- Objective-verification test is manual CLI invocation, not a runnable automated test (high)
- No test runner or framework specified for
.tsfiles (high) - No test for conditional researcher spawning based on UI signals (medium)
- Integration test doesn’t specify how to simulate env var manipulation (medium)
Risk Review
Risk level: Medium overall.
- No partial-failure handling for researcher agents — if one hangs, synthesis blocks indefinitely (high)
- Unverified tool names in
allowed-toolscould cause silent permission failures (high) - Dual-artifact prompt drift risk between agent files and reference file (medium)
- No token budget ceiling for 4–6 parallel researchers (medium)
- Experimental flag dependency — silent removal could break the feature (medium)
Conventions Review
Fit: Closely follows established project patterns — agent definition structure, file placement, kebab-case naming all align.
- Redundant prompt storage conflicts with
review-plansingle-source pattern (medium) - Test file placement in non-existent
tests/plan-agent/directory (medium) allowed-toolscasing forTeamCreate/TeamDeleteneeds verification (medium)
UX Review
User fit: Well-structured plan, but the research phase UX lacks progress feedback and auto-detection predictability.
- No progress indication during research phase — users face silent wait (high)
- Auto-detection UX undefined — users cannot predict when research spawns (high)
- Graceful degradation is too quiet when
--researchwas explicitly requested (medium) - Research Brief hidden by default in collapsed
<details>(low) - No empty-state handling for partial researcher failure in the Brief (low)
Accessibility Review
A11y compliance: Largely WCAG 2.1 AA compliant with strong foundations, but medium-severity gaps in the plan HTML template itself.
status-badgelacks accessible label tying it to the plan title (medium)<details>/<summary>marker removal may break VoiceOver disclosure announcement (medium)- Disabled checkboxes removed from tab order without
aria-disabledbackup (medium) compare-griduses divs instead of semantic table structure (medium)- Pulse-dot animation missing
prefers-reduced-motionsuppression (medium)
Agreements & Conflicts
Confirmed concerns (multiple reviewers agree):
- Dual-artifact redundancy — flagged by architecture, completeness, risk, conventions, and accessibility (5/7). Consensus: use agent definitions as single source of truth.
- Auto-detection heuristic underspecified — flagged by architecture, completeness, UX, risk (4/7). Consensus: define concrete rules in the step, default to conservative (opt-in bias).
- Unverified tool names — flagged by architecture, risk, conventions (3/7). Consensus: verify against review-plan’s allowed-tools before implementation.
- No partial-failure handling — flagged by risk, UX, architecture (3/7). Consensus: proceed with available findings after turn limit, note missing roles.
- Test infrastructure incomplete — flagged by completeness, testability (2/7). Consensus: specify test runner and add test creation steps.
No conflicts — all reviewers were directionally aligned. No contradictory recommendations were surfaced.
Highest-Risk Issues (priority order)
- Dual-artifact prompt drift (5 reviewers) — resolve before implementation by making agent definitions the single canonical prompt source
- Auto-detection heuristic undefined (4 reviewers) — embed concrete detection rules in Step 6 with conservative default
- No partial-failure / timeout handling (3 reviewers) — add turn-limit rule and missing-role notation to synthesis
- Unverified
TeamCreate/TeamDeletetool names (3 reviewers) — cross-reference with actual Agent Teams API before writingallowed-tools - Test infrastructure gaps (2 reviewers) — specify test runner, add test creation steps, clarify manual vs automated test expectations
Inline Edits Applied
| Target | Action | Change |
|---|---|---|
Step 1 .step-why | edit | Added panel recommendation: agent definitions as single source of truth for role prompts |
Step 6 .step-why | edit | Added concrete auto-detection heuristic, partial-failure rule, progress reporting, and stronger degradation warning |
Step 5 .step-why | edit | Added last-wins conflict resolution documentation requirement |
Step 7 .step-why | edit | Changed <details> to <details open> for Research Brief discoverability |
Step 8 .step-why | edit | Added tool name verification requirement against Agent Teams API |
Step 1 .verify-body | edit | Added directory pre-check for references/ |
Step 3 .verify-body | edit | Added directory pre-check for agents/ |
#criteria-list | append | Added AC11 (partial-failure handling), AC12 (progress reporting), AC13 (flag conflict docs) |