Plan: Add Research Agent Team to implementation-plan skill

Objective

Ship a Research Agent Team phase (Step 0c) that spawns 4–6 specialized researcher teammates to investigate the plan objective from multiple angles in parallel, synthesizes findings into a Research Brief that feeds Steps 1–2, and gracefully degrades to single-session Explore when Agent Teams are unavailable

Implement

Read and implement all steps in the plan at docs/plans/add-research-agent-team.html — Ship a Research Agent Team phase for the implementation-plan skill

Run as workflow — launch parallel subagents

Run a workflow to implement the plan at docs/plans/add-research-agent-team.html — Ship a Research Agent Team for implementation-plan

File add-research-agent-team.html

Path docs/plans/add-research-agent-team.html

Acceptance criteria 0 / 10 done

Context

The implementation-plan skill currently runs a single-session sequential pipeline — one Claude instance explores the codebase (Step 0b), clarifies requirements (Step 1), and writes the plan (Step 2). For complex plans spanning multiple subsystems, research breadth is limited by what one context window can hold, and there is no adversarial tension: the same agent that researches also writes the plan.

The review-plan skill already demonstrates a proven Agent Team pattern in this plugin: it spawns 5–7 parallel reviewers with structured output, collects findings via SendMessage, and synthesizes results. The same pattern can be applied earlier in the pipeline — during research, before plan creation — to produce richer, more thoroughly investigated plans. Agent Teams (experimental, v2.1.32+) are the right tool because researchers need to communicate with each other: a codebase analyst discovering an API constraint can message the dependency researcher directly to adjust the investigation.

When Agent Teams are unavailable (flag unset, old version), the skill must degrade gracefully to the existing single-session Explore — not hard-stop like review-plan does. Plan creation is too critical to block on an experimental feature.

Files to Modify

agentics/

kit/plugins/plan-agent/agents/

plan-researcher-codebase.md new codebase architecture researcher agent
plan-researcher-dependencies.md new library and API dependency researcher
plan-researcher-devils-advocate.md new adversarial assumption challenger (conditional)
plan-researcher-prior-art.md new web best-practices researcher
plan-researcher-test-strategy.md new test infrastructure analyst
plan-researcher-ux.md new UI pattern researcher (conditional)

kit/plugins/plan-agent/skills/implementation-plan/references/

research-prompts.md new role-prompt templates for 6 researchers
research-brief-template.md new synthesis template for Research Brief

kit/plugins/plan-agent/skills/implementation-plan/

SKILL.md modified add Step 0c, --research flag, allowed-tools

kit/plugins/plan-agent/

CHANGELOG.md modified add version entry for research team
README.md modified document research team feature

kit/plugins/plan-agent/.claude-plugin/plugin.json modified mention research team in description
.claude-plugin/marketplace.json modified bump plan-agent minor version

Diagram

Implementation-plan workflow with Research Agent Team

Step 0b

Explore

Quick codebase scan (single session)

Step 0c (NEW)

Research Agent Team

4–6 parallel researchers → Research Brief

Step 1

Clarify

Informed by Research Brief findings

Step 2

Create

Write plan with deeper context

Why Agent Teams over alternatives

Agent Teams (chosen)

Researchers message each other directly
Adversarial tension between teammates
Shared task list for coordination
Proven pattern in review-plan skill

Subagents

Simpler, no flag needed
Lower token overhead
No inter-agent communication
Each reports in isolation

Single session (status quo)

One context window limit
No adversarial challenge
Sequential research only
Already in place as fallback

Researcher roles

Research Agent Team composition — 4 core + 2 conditional
Role	Focus	Tools	When
codebase-analyst	Map architecture, find patterns, identify integration points	Read, Glob, Grep, Bash	Always
dependency-researcher	Libraries, APIs, external services, version compatibility	Read, Glob, Grep, Bash, WebSearch, WebFetch	Always
prior-art-researcher	Best practices, known pitfalls, reference implementations	WebSearch, WebFetch, Read	Always
test-strategist	Test infrastructure, coverage gaps, recommended approach	Read, Glob, Grep, Bash	Always
ux-researcher	UI patterns, accessibility requirements, responsive design	Read, Glob, Grep, WebSearch, WebFetch	UI signals
devils-advocate	Challenge assumptions, find failure modes, identify risks	Read, Glob, Grep, Bash	Complex plans

Steps

todo Create references/research-prompts.md with role-prompt templates for all 6 researcher types

Externalizing prompts into a reference file follows the proven pattern from review-plan/references/role-prompts.md and keeps the main SKILL.md focused on orchestration logic. Each prompt needs <OBJECTIVE> and <CODEBASE_CONTEXT> placeholders plus a structured [Role Report] output format with SendMessage. Panel recommendation: the agent definition files (Step 3–4) should carry the full role prompt, making this reference file the canonical template source that agents are generated from — not a parallel source of truth. The orchestrator reads prompts from agent definitions at spawn time, matching the review-plan pattern exactly.

Verify

Pre-check: confirm kit/plugins/plan-agent/skills/implementation-plan/references/ directory exists (create it if not). Then read the file; confirm 6 distinct role sections (codebase-analyst, dependency-researcher, prior-art-researcher, test-strategist, ux-researcher, devils-advocate) with <OBJECTIVE> and <CODEBASE_CONTEXT> placeholders and SendMessage reporting format in each.

todo Create references/research-brief-template.md with the synthesis template

A structured template ensures synthesis is consistent regardless of which researchers were spawned, and bridges directly into the Clarify and Create steps. Sections: Executive Summary, Per-Role Findings (6 slots), Key Constraints Discovered, Recommended Approach, Open Questions for Clarify Step.

Verify

Read the file; confirm it has all 6 role-finding sections, an Executive Summary section, Key Constraints, Recommended Approach, and Open Questions sections.

todo Create 4 core agent definitions: plan-researcher-codebase.md, plan-researcher-dependencies.md, plan-researcher-prior-art.md, plan-researcher-test-strategy.md

Agent definitions let the Agent Team system use these as subagent types when spawning teammates, giving each researcher a scoped tool allowlist and a focused system prompt. Follow the existing plan-reviewer-*.md pattern: frontmatter with name, description, allowed-tools, model: sonnet; body with mandate, how-to-research instructions, and SendMessage reporting format. Codebase-analyst gets Read, Glob, Grep, Bash; dependency-researcher gets Read, Glob, Grep, Bash, WebSearch, WebFetch; prior-art gets WebSearch, WebFetch, Read; test-strategist gets Read, Glob, Grep, Bash.

Verify

Pre-check: confirm kit/plugins/plan-agent/agents/ directory exists (create it if not — existing plan-reviewer-*.md agent files already live here, so the directory should exist, but verify). Then read each of the 4 files; confirm valid frontmatter with model: sonnet and the correct allowed-tools per role. Confirm each has a mandate section and a SendMessage output format block.

todo Create 2 conditional agent definitions: plan-researcher-ux.md and plan-researcher-devils-advocate.md

Conditional agents avoid wasting tokens on UI research for backend-only plans or adversarial challenge for simple plans. Mirrors the review-plan pattern where UX and accessibility reviewers are conditional on UI signals. UX researcher gets Read, Glob, Grep, WebSearch, WebFetch; devil’s advocate gets Read, Glob, Grep, Bash.

Verify

Read both files; confirm frontmatter and body structure match the core agents from Step 3. Confirm the UX researcher mentions UI signal activation and the devil’s advocate mentions complex-plan activation.

todo Add --research and --no-research flags to SKILL.md argument parsing

Explicit opt-in/opt-out keeps the feature controllable and avoids surprise token consumption from Agent Teams. --research forces the Research Agent Team phase. --no-research skips it. --quick shorthand expands to include --no-research. Panel recommendation: document the conflict resolution rule explicitly in the SKILL.md flags section: when both --research and --no-research are present, last-wins (consistent with standard CLI flag conventions). This must be visible in the flags documentation, not just tested.

Verify

Read the SKILL.md Invocation & Arguments section; confirm both flags are documented with the same structure as existing flags. Confirm the --quick description now includes --no-research in its shorthand expansion.

todo Add Step 0c (Research Agent Team) to SKILL.md Workflow between Step 0b and Step 1

This is the core feature. The step must: (a) check if --research is set or auto-detect complexity — concrete heuristic: spawn when any of: objective mentions 3+ distinct subsystem keywords (e.g. frontend+backend+database, or auth+API+UI), file-tree from Step 0b spans 3+ top-level directories, or objective contains cross-layer verbs (migrate, integrate, bridge, sync); default to skip otherwise (conservative, opt-in bias); (b) check Agent Teams availability (version >= 2.1.32 + CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS); (c) if unavailable and --research was explicitly passed, emit a visible warning (not just a log note): “Research Agent Team requested but Agent Teams unavailable — falling back to single-session Explore”; if auto-detected, log silently and fall through; (d) detect UI signals (same heuristic as review-plan Step 3b); (e) read agent definitions from agents/plan-researcher-*.md, substitute placeholders; (f) spawn 4 core + up to 2 conditional researchers; (g) wait for all via SendMessage — partial-failure rule: if fewer than 3 of 4 core researchers report within the turn limit, proceed with available findings and note missing roles in the Research Brief (e.g. “dependency-researcher: no findings received”); (h) read references/research-brief-template.md and synthesize; (i) inject Research Brief into context for Steps 1–2. Progress reporting: as each researcher completes, emit a status line (e.g. “Research: codebase-analyst done (2/4 complete)”).

Verify

Read SKILL.md; confirm Step 0c exists between 0b and 1 with all 9 sub-steps. Confirm the Agent Teams check is soft (logs and falls through) not hard (stops). Confirm --no-research and --quick skip the step entirely.

todo Update SKILL.md Step 2 (Create) to inject the Research Brief into the plan’s Context section

Preserving the research findings in the plan makes it self-documenting — readers can see what investigation informed the plan’s decisions. When a Research Brief was generated, render it as a collapsible <details open> block (open by default for discoverability) titled “Research Brief” inside the Context section; omit entirely when no brief was generated. Panel recommendation: default to open so users don’t miss the research findings; they can collapse it after reading.

Verify

Read SKILL.md Step 2; confirm it references Research Brief injection with the collapsible <details> pattern and the omit-when-empty rule.

todo Update SKILL.md allowed-tools frontmatter to include Agent Team tools

Without Agent Team tools in allowed-tools, the skill will trigger permission prompts when orchestrating the research team. Panel recommendation: before adding tool names, verify the exact spellings against the current Agent Teams API (check the review-plan skill’s allowed-tools as the authoritative reference). The plan assumes TeamCreate and TeamDelete but these must be confirmed — wrong names cause silent permission failures, not clear errors.

Verify

Read the SKILL.md frontmatter allowed-tools line; confirm it includes SendMessage, TeamCreate, and TeamDelete.

todo Update plugin.json description to mention the Research Agent Team capability

The plugin description is shown in marketplace listings and skill discovery; it should reflect the new capability so users know the feature exists.

Verify

Read .claude-plugin/plugin.json; confirm the description mentions “Research Agent Team” or “research team”.

todo Update README.md with Research Agent Team documentation

The README is the primary user-facing documentation. Add a section covering: when the research phase activates, what researchers are spawned, how to force (--research) or skip (--no-research) it, and the graceful degradation behavior.

Verify

Read README.md; confirm a new section documents the research team feature with flag documentation and degradation behavior.

todo Add a new version entry to CHANGELOG.md

The CHANGELOG tracks all plugin changes per the project’s versioning conventions. This is a minor-bump feature addition.

Verify

Read CHANGELOG.md; confirm the new version entry exists and describes the Research Agent Team feature under an “Added” heading.

todo Bump plan-agent minor version in .claude-plugin/marketplace.json

The project convention requires a manual version bump in marketplace.json for every plugin change. New feature = minor bump.

Verify

Read .claude-plugin/marketplace.json; confirm the plan-agent version is higher than the current value on main and follows semver minor-bump convention.

Tests

Tier 1 — Code-touching plan

Objective Research Agent Team spawns, collects, and synthesizes a Research Brief

File: tests/plan-agent/research-team-smoke.test.ts

Type: Smoke test

Asserts: Invoking /plan-agent:implementation-plan --research on a multi-subsystem objective spawns at least 4 researcher teammates, collects structured findings via SendMessage, and produces a Research Brief block in the plan’s Context section. Also verifies that --no-research skips the team entirely and produces no Research Brief.

Run: claude --plugin-dir kit/plugins/plan-agent -p "/plan-agent:implementation-plan add auth middleware --research --no-clarify --no-align --no-interview"

Unit Argument parsing: --research and --no-research flags

File: tests/plan-agent/research-flags.test.ts

Targets: SKILL.md argument parsing logic for new flags

Key cases: --research sets research mode; --no-research disables it; --quick implies --no-research; both flags absent triggers auto-detection; conflicting --research --no-research uses last-wins

Unit Agent definition files have valid frontmatter

File: tests/plan-agent/researcher-agents-valid.test.ts

Targets: All 6 plan-researcher-*.md agent definition files

Key cases: Each file has valid YAML frontmatter with required fields (name, description, allowed-tools, model); model is sonnet; allowed-tools matches the role’s expected tool set; body contains SendMessage reporting instructions

Integration Graceful degradation when Agent Teams are unavailable

File: tests/plan-agent/research-team-degradation.test.ts

Targets: Step 0c Agent Teams availability check and fallback path

Key cases: When CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is unset, Step 0c logs a note and falls through to Step 1 (not hard-stop); when Claude Code version < 2.1.32, same behavior; plan is still produced successfully without the Research Brief

Integration Research Brief injection into plan Context section

File: tests/plan-agent/research-brief-injection.test.ts

Targets: Step 2 (Create) Research Brief handling

Key cases: When Research Brief exists, a collapsible <details> block appears in Context; when no brief was generated, the block is absent; the brief content is HTML-escaped

Verification

Full-feature path: Set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1, run /plan-agent:implementation-plan "Add WebSocket real-time notifications across frontend, backend, and database layers" --research --no-clarify --no-align --no-interview. Confirm: Step 0c spawns at least 4 researchers, collects findings, produces a Research Brief in the plan’s Context section as a collapsible <details> block. The plan itself is well-formed HTML with all required sections.
Graceful degradation path: Unset CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS, run the same command with --research. Confirm: Step 0c logs “Agent Teams unavailable — falling back to single-session Explore” (or similar) and proceeds to Step 1 without error. The plan is produced successfully without a Research Brief block.
Skip path: Run /plan-agent:implementation-plan "Fix typo in README" --no-research --quick. Confirm: Step 0c is skipped entirely. No Agent Team is created. Plan is produced as normal.
Conditional researchers: Run a plan with UI signals (e.g. “Add React dashboard with charts and filters”) using --research. Confirm: UX researcher is spawned (6 total researchers). Run a complex non-UI plan. Confirm: devil’s advocate is spawned but UX researcher is not (5 total).
Agent definitions: Run ls kit/plugins/plan-agent/agents/plan-researcher-*.md | wc -l and confirm output is 6. Grep each for model: sonnet and SendMessage.

Completion Checklist

Required

All step TODOs marked as done
All acceptance criteria verified and checked off
Plan status updated to completed

Completion Report

No items to report — all requirements met.

Next Steps

Add a --research-bg background research mode

Paste this prompt into Claude to execute this follow-up:

Add a --research-bg flag to the implementation-plan skill that spawns the Research Agent Team in the background (via the agent-review-plan background pattern) and proceeds immediately to Steps 1-2 with whatever Explore context is available. When the research team completes, inject the Research Brief into the plan retroactively via an Edit pass. This lets users start planning immediately without waiting for the full research phase. Model the background dispatch on kit/plugins/plan-agent/agents/agent-review-plan.md.

Run the review-plan Agent Team on plans produced with research

Paste this prompt into Claude to execute this follow-up:

After the Research Agent Team feature is shipped, run /plan-agent:review-plan on 3 plans produced with --research and 3 produced without. Compare the review findings: do researched plans have fewer completeness gaps, better architecture fit, and lower risk ratings? Report the comparison as a table. This validates whether the research phase measurably improves plan quality.

Update the CLAUDE.md reference table for plan-agent

Paste this prompt into Claude to execute this follow-up:

Update the plan-agent row in the CLAUDE.md Reference Implementations table to mention the Research Agent Team capability. Add a note about the --research flag and the 4-6 researcher teammates. Keep the entry concise (single table cell) and consistent with the other plugin descriptions.

Wish List

Researchers that learn from plan review feedback Wish List

Speculative / blue-sky idea — not on the critical path. Paste into Claude when ready to explore:

Design a feedback loop between the review-plan Agent Team and the Research Agent Team. When a review finds a gap (e.g. "missing migration rollback strategy"), store that finding category in a persistent researcher-hints file. On subsequent plan creations, the relevant researcher (e.g. risk or dependency) loads those hints to proactively investigate areas that past reviews flagged. This creates a learning loop where the research phase improves over time based on actual review outcomes. Investigate whether CLAUDE.md project memory or a dedicated .claude/plan-agent/research-hints.json would be the better persistence mechanism.

Token budget awareness for research team sizing Wish List

Speculative / blue-sky idea — not on the critical path. Paste into Claude when ready to explore:

Add token budget awareness to Step 0c so the research team size adapts to the user's context. When the user passes a "+500k" budget directive, spawn all 6 researchers with deeper investigation prompts. At default budget, spawn only 4 core researchers with focused prompts. At low budget or --quick, skip the team entirely. Investigate whether the Workflow tool's budget.total / budget.remaining() API could be used here, or whether a simpler heuristic based on plan complexity alone is sufficient.

Unresolved Questions

Should complexity auto-detection default to spawning the team or skipping it?

The implementation-plan skill needs a complexity heuristic to decide when to auto-spawn the Research Agent Team (when neither --research nor --no-research is set). Two options: (A) conservative default — only spawn when complexity clearly exceeds a threshold (3+ subsystems, cross-layer changes), meaning most plans skip the research phase unless explicitly requested; (B) aggressive default — spawn for any plan that touches 2+ files, meaning most non-trivial plans get research. Consider: Agent Teams are expensive (5-10x tokens), experimental (flag required), and the single-session Explore is already decent for simple plans. Recommend one approach with reasoning, and draft the specific heuristic rules for the chosen approach.

Should researchers use the existing agent definitions or spawn as plain teammates?

Agent Teams can spawn teammates using subagent definitions (AGENT.md files) via the agentType option, which gives each teammate a scoped tool allowlist and system prompt. Alternatively, teammates can be spawned as plain sessions with the role prompt passed directly in the spawn instruction. The review-plan skill uses agent definitions (plan-reviewer-*.md). Compare: (A) using AGENT.md files (consistent with review-plan, tool scoping, reusable as standalone subagents) vs (B) plain spawn prompts (simpler, fewer files, prompts live in research-prompts.md only). Given that this plan already creates 6 agent definition files AND a research-prompts.md, is that redundant? Should the agent definitions contain the full prompt and research-prompts.md be eliminated, or should agent definitions be minimal (tools + model only) with research-prompts.md carrying the role instructions? Recommend one approach.

Team Review (2026-06-08 13:51:00 UTC)

Executive Summary

The plan is sound with revisions. All 7 reviewers agree the design follows the proven review-plan Agent Team pattern and inserts cleanly into the existing pipeline as Step 0c with proper graceful degradation. However, the team identified three high-priority issues that should be resolved before implementation: (1) the dual-artifact redundancy between agent definition files and research-prompts.md — flagged by 5 of 7 reviewers — needs a single source of truth; (2) the complexity auto-detection heuristic is underspecified, leaving the most common invocation path (no flags) with an undefined branch; and (3) the test infrastructure is incomplete — test files are declared but no runner, framework, or directory creation steps exist.

Role-by-Role Findings

Architecture Review

Fit: Follows the established Agent Team pattern with well-scoped component boundaries and proper graceful degradation.

Dual-artifact redundancy between agent definitions and research-prompts.md (medium) — recommends agent definitions as single source of truth, matching review-plan
Context injection is implicit — Research Brief handoff between 0c and Step 1 has no defined serialization boundary (medium)
Auto-detection heuristic underspecified — no programmatic detection logic defined (medium)
Unverified tool names TeamCreate/TeamDelete (low)

Completeness Review

Completeness: Well-structured with 12 steps covering all named files, but several gaps could cause implementation friction.

No directory creation steps for agents/ and references/ before writing files (medium)
No step to read existing SKILL.md before Steps 5–8 edits (medium)
Auto-detection heuristic buried in Unresolved Question, not in Step 6 (high)
Test files declared but no creation steps — tests appear in Tests section but not in Steps (high)
Unresolved Questions affect implementation — two open design decisions directly impact Step 6 (high)
No test runner specified for .ts test files (medium)

Testability Review

Test coverage: Adequate for Tier 1 with objective-verification, unit, and integration tests, but several critical paths lack coverage.

No test for auto-detection heuristic (high)
Objective-verification test is manual CLI invocation, not a runnable automated test (high)
No test runner or framework specified for .ts files (high)
No test for conditional researcher spawning based on UI signals (medium)
Integration test doesn’t specify how to simulate env var manipulation (medium)

Risk Review

Risk level: Medium overall.

No partial-failure handling for researcher agents — if one hangs, synthesis blocks indefinitely (high)
Unverified tool names in allowed-tools could cause silent permission failures (high)
Dual-artifact prompt drift risk between agent files and reference file (medium)
No token budget ceiling for 4–6 parallel researchers (medium)
Experimental flag dependency — silent removal could break the feature (medium)

Conventions Review

Fit: Closely follows established project patterns — agent definition structure, file placement, kebab-case naming all align.

Redundant prompt storage conflicts with review-plan single-source pattern (medium)
Test file placement in non-existent tests/plan-agent/ directory (medium)
allowed-tools casing for TeamCreate/TeamDelete needs verification (medium)

UX Review

User fit: Well-structured plan, but the research phase UX lacks progress feedback and auto-detection predictability.

No progress indication during research phase — users face silent wait (high)
Auto-detection UX undefined — users cannot predict when research spawns (high)
Graceful degradation is too quiet when --research was explicitly requested (medium)
Research Brief hidden by default in collapsed <details> (low)
No empty-state handling for partial researcher failure in the Brief (low)

Accessibility Review

A11y compliance: Largely WCAG 2.1 AA compliant with strong foundations, but medium-severity gaps in the plan HTML template itself.

status-badge lacks accessible label tying it to the plan title (medium)
<details>/<summary> marker removal may break VoiceOver disclosure announcement (medium)
Disabled checkboxes removed from tab order without aria-disabled backup (medium)
compare-grid uses divs instead of semantic table structure (medium)
Pulse-dot animation missing prefers-reduced-motion suppression (medium)

Agreements & Conflicts

Confirmed concerns (multiple reviewers agree):

Dual-artifact redundancy — flagged by architecture, completeness, risk, conventions, and accessibility (5/7). Consensus: use agent definitions as single source of truth.
Auto-detection heuristic underspecified — flagged by architecture, completeness, UX, risk (4/7). Consensus: define concrete rules in the step, default to conservative (opt-in bias).
Unverified tool names — flagged by architecture, risk, conventions (3/7). Consensus: verify against review-plan’s allowed-tools before implementation.
No partial-failure handling — flagged by risk, UX, architecture (3/7). Consensus: proceed with available findings after turn limit, note missing roles.
Test infrastructure incomplete — flagged by completeness, testability (2/7). Consensus: specify test runner and add test creation steps.

No conflicts — all reviewers were directionally aligned. No contradictory recommendations were surfaced.

Highest-Risk Issues (priority order)

Dual-artifact prompt drift (5 reviewers) — resolve before implementation by making agent definitions the single canonical prompt source
Auto-detection heuristic undefined (4 reviewers) — embed concrete detection rules in Step 6 with conservative default
No partial-failure / timeout handling (3 reviewers) — add turn-limit rule and missing-role notation to synthesis
Unverified TeamCreate/TeamDelete tool names (3 reviewers) — cross-reference with actual Agent Teams API before writing allowed-tools
Test infrastructure gaps (2 reviewers) — specify test runner, add test creation steps, clarify manual vs automated test expectations

Inline Edits Applied

Target	Action	Change
Step 1 `.step-why`	edit	Added panel recommendation: agent definitions as single source of truth for role prompts
Step 6 `.step-why`	edit	Added concrete auto-detection heuristic, partial-failure rule, progress reporting, and stronger degradation warning
Step 5 `.step-why`	edit	Added last-wins conflict resolution documentation requirement
Step 7 `.step-why`	edit	Changed `<details>` to `<details open>` for Research Brief discoverability
Step 8 `.step-why`	edit	Added tool name verification requirement against Agent Teams API
Step 1 `.verify-body`	edit	Added directory pre-check for `references/`
Step 3 `.verify-body`	edit	Added directory pre-check for `agents/`
`#criteria-list`	append	Added AC11 (partial-failure handling), AC12 (progress reporting), AC13 (flag conflict docs)

Context

Files to Modify

Diagram

Steps

Tests

Acceptance Criteria

Verification

Completion Checklist

Completion Report

Next Steps

Executive Summary

Role-by-Role Findings

Architecture Review

Completeness Review

Testability Review

Risk Review

Conventions Review

UX Review

Accessibility Review

Agreements & Conflicts

Highest-Risk Issues (priority order)

Inline Edits Applied