Plan: Add Tests section to implementation plans

Objective

Ship a mandatory Tests section in every implementation plan that specifies real-world application tests — actual unit, integration, E2E, and other test files written for the application or feature, run by the project's test runner, and committed to the codebase. Every plan must also include an objective-verification mock/smoke test: a real test file that runs against the application and confirms the plan's stated goal is actually accomplished. The section uses a two-tier system: Tier 1 (code-touching plans) requires all applicable test sub-sections; Tier 2 (non-code plans like docs or file-move chores) requires only the mandatory objective-verification test. The author selects the tier based on what the steps actually do, not the type: field.

Implement

Read and implement all steps in the plan at docs/plans/add-tests-section-to-plans.html — Add a Tests section to the plan template with unit, integration, E2E, and objective-verification tests

File add-tests-section-to-plans.html

Path docs/plans/add-tests-section-to-plans.html

Acceptance criteria 12 / 12 done

Context

The current implementation plan template (plan-mode.md, reference/SKELETON.md, and the HTML SKELETON.html) defines a robust verification chain: per-step verification confirms each step ran correctly, acceptance criteria confirm the plan meets the definition of done, and end-to-end verification confirms the whole plan executed correctly. However, none of these layers require real application tests — they are all prose-based assertions written inside the plan document itself.

This gap means a plan can be marked "completed" without any automated proof that the changes actually work in the running application. A developer could implement all steps, check all criteria boxes, and ship code that has zero test coverage. The missing layer is a Tests section that specifies real-world tests — actual test files (unit, integration, E2E) that are written for the application or feature, run by the project's test runner, and ship with the codebase. These are not plan-level prose checks; they are the same tests a developer would write and commit alongside feature code.

Additionally, every plan must include a mandatory objective-verification test — a real mock or smoke test that runs against the actual application and directly asserts the plan's stated objective is accomplished. Like all tests in this section, the objective-verification test is a real executable test file, not a prose assertion in the plan document. This closes the loop between "what we said we'd do" and "what the application actually does."

Tier decision (resolved): Investigation of 22 existing plans (17 feature, 4 chore, 1 fix, 1 refactor, 0 docs) showed that chore plans span a wide spectrum — from pure file moves (clean-up-docs) to code-mutating changes (enable-model-invocation). A conditionally-present rule (skip Tests for docs/chore) would either under-test code-touching chores or force a fuzzy "is this code-touching?" classification on every author. Instead, the Tests section is always present with a two-tier depth model: Tier 1 (code-touching plans) includes all applicable sub-sections (unit, integration, E2E) plus the mandatory objective-verification test; Tier 2 (non-code plans: docs, file-move chores) requires only the objective-verification test, with other sub-sections omitted entirely rather than left as empty stubs. The author selects the tier based on what the steps actually do, not the type: frontmatter field — a type: chore plan that changes import paths is Tier 1, while a type: chore plan that moves directories is Tier 2.

Files to Modify

agentics/

~/.claude/rules/
- plan-mode.md modified add tests to Required Structure
~/.claude/rules/reference/
- SKELETON.md modified add Tests section template
kit/plugins/plan-agent/skills/implementation-plan/
- SKILL.md modified add Tests to Required Structure and HTML Output
- reference/SKELETON.html modified add Tests section HTML, CSS, and nav link

Diagram

Verification layers after this change

Layer 1

Per-step verification

Local: did this step do what it should?

Layer 2 (new)

Tests section

Real application tests: unit, integration, E2E, and objective-verification test files that ship with the code

Layer 3

Acceptance criteria

Definition of done: falsifiable conditions

Layer 4

End-to-end verification

Whole-plan confirmation

Test types, tiers, and when to include them

Each test type below is a real application test — an actual test file written for the feature, run by the project's test runner, and committed to the codebase
Test Type	Scope	Tier	Include When	Required?
Unit	Real test file targeting a single function, method, or module in isolation	Tier 1 only	Plan adds or modifies business logic, utilities, helpers, data transformations	When applicable
Integration	Real test file exercising multiple modules, services, or layers working together	Tier 1 only	Plan touches API routes, database queries, middleware chains, service boundaries	When applicable
E2E	Real test file driving a full user flow through the running application	Tier 1 only	Plan affects user-facing flows, page navigation, form submissions, auth flows	When applicable
Objective-verification	Real mock/smoke test file that runs against the application and asserts the plan's stated objective is accomplished	Both tiers	Always — every plan must include one; always appears first in the Tests section as a highlighted hero card	Mandatory

Tier selection guide

The author selects the tier based on what the steps do, not the `type:` frontmatter field
Tier	When to use	Required sub-sections	Examples
Tier 1 (code-touching)	Any step creates, modifies, or deletes application source files, configs that affect runtime, or test files	Unit, Integration, E2E (when applicable) + Objective-verification (always)	feature, fix, refactor; chores that change imports, bump deps with code changes
Tier 2 (non-code)	Steps only move, rename, or delete files; write docs; update metadata; or perform other non-code operations	Objective-verification only (other sub-sections omitted, not left as empty stubs)	docs plans; chores that move directories, rename files, update non-runtime metadata

Steps

done Add tests to the Required Structure in plan-mode.md

This is the authoritative rule file that all plans must follow. Adding the Tests section here makes it a first-class structural requirement alongside context, objective, steps, acceptance-criteria, and verification. The bullet must introduce the two-tier model so all downstream templates and skills inherit it.

Verify

Re-read ~/.claude/rules/plan-mode.md § Required Structure. Confirm a new tests bullet exists between steps and acceptance-criteria, and that it mentions the two-tier depth model (Tier 1 for code-touching plans, Tier 2 for non-code plans). At this point it only needs to name the section, its purpose, and the tier distinction — the detailed specification (sub-categories, mandatory vs. conditional rules) is Step 2's deliverable. No other sections should be altered.

done Define the Tests section specification in plan-mode.md

The bullet alone names the section; this step specifies the rules for how to populate it — the two-tier depth model, which test types are conditional vs. mandatory, what qualifies as an objective-verification test, and how these real application tests relate to the existing prose-based verification chain.

Verify

Read the new tests bullet and confirm it specifies: (1) the two-tier model — Tier 1 (code-touching) includes unit/integration/E2E "when applicable" plus mandatory objective-verification; Tier 2 (non-code) includes only the mandatory objective-verification test with other sub-sections omitted entirely; (2) the author selects the tier based on what the steps do, not the type: field; (3) the objective-verification test is a mandatory real mock/smoke test that runs against the application and asserts the plan's stated objective; (4) the objective-verification test always appears first in the Tests section as a highlighted hero card, before any unit/integration/E2E sub-sections; (5) the Tests section is explicitly distinct from per-step verification (prose) and end-to-end verification (prose) — Tests are real test files that ship with the code.

done Add the Tests template to reference/SKELETON.md

The skeleton is copied as the starter for every new markdown plan. Without the Tests section in the skeleton, authors will produce plans missing the new required section despite the rule.

Verify

Open ~/.claude/rules/reference/SKELETON.md and confirm a ## Tests section exists between ## Steps and ## Acceptance Criteria. The objective-verification test placeholder must appear first (before unit/integration/E2E sub-headings), reflecting the hero-card-first layout. Confirm no existing sections were removed or reordered.

done Add Tests section CSS to the HTML SKELETON.html

The HTML plan skeleton needs dedicated styling for the Tests section — test-type badges (unit/integration/E2E/objective), a test list layout consistent with step cards, and a visually distinct objective-verification hero card that appears first in the section.

Verify

Open kit/plugins/plan-agent/skills/implementation-plan/reference/SKELETON.html <style> block and confirm new CSS rules exist for: .test-list, .test-card, .test-badge (with variants .test-badge-unit, .test-badge-integration, .test-badge-e2e, .test-badge-objective), and .objective-test-card. Confirm the objective test card has a visually distinct treatment (e.g. accent border, highlighted background).

done Add Tests section HTML markup to SKELETON.html

The HTML body needs the actual section element with placeholders for test items, positioned between the Steps section and Acceptance Criteria. This includes the nav sidebar link and the beaker icon reference.

Verify

Open kit/plugins/plan-agent/skills/implementation-plan/reference/SKELETON.html and confirm: (1) a <section class="section-card card-tests" id="tests"> element exists between the Steps and Acceptance Criteria sections; (2) the .objective-test-card appears first inside the section as a highlighted hero card, before the .test-list container with unit/integration/E2E placeholder .test-card items; (3) a nav sidebar <li> with href="#tests" and a beaker icon is present between Steps and Criteria links.

done Update the SKILL.md Required Structure and HTML Output Requirements

The implementation-plan skill file is the authoritative spec for HTML plan generation. It must document the Tests section alongside the other required sections, specify the two-tier model, describe which real application test types to auto-generate from step analysis, and describe the mandatory objective-verification mock/smoke test — all of which are real test files for the application, not plan-level prose.

Verify

Read kit/plugins/plan-agent/skills/implementation-plan/SKILL.md Required Structure and HTML Output Requirements sections. Confirm: (1) tests is listed as a required section between steps and acceptance-criteria; (2) the description specifies the two-tier model — Tier 1 (code-touching) includes unit/integration/E2E as conditional plus objective-verification as mandatory; Tier 2 (non-code) includes only objective-verification with other sub-sections omitted; (3) the tier is selected by step content analysis, not the type: field; (4) the HTML Output Requirements mention the new CSS classes and the section's placement.

done Add test-generation guidance to the SKILL.md workflow

The plan-agent skill needs a workflow instruction telling it when and how to populate the Tests section during plan creation — analyzing the plan's steps to determine the tier (Tier 1 if any step creates/modifies/deletes application source files; Tier 2 otherwise), selecting which real application test types apply, and auto-drafting the objective-verification mock/smoke test from the objective statement. All generated test entries must describe real test files to be written and committed, not prose assertions.

Verify

Read the Workflow section in kit/plugins/plan-agent/skills/implementation-plan/SKILL.md and confirm a test-generation instruction exists (either as a sub-step of an existing step or as a new step). It should describe: (1) scanning step content to classify the tier — Tier 1 if any step touches application source files, Tier 2 otherwise; (2) for Tier 1, selecting applicable test types (unit/integration/E2E) based on what the steps modify; (3) for Tier 2, omitting unit/integration/E2E sub-sections entirely; (4) always generating an objective-verification test entry regardless of tier; (5) populating the {test-items} and {objective-test} placeholders in the skeleton.

Verification

After implementing all steps, generate a new plan using /plan-agent:implementation-plan for any feature objective (e.g. "Add a search bar to the dashboard"). Open the resulting HTML plan and confirm:

A Tests section appears between Steps and Acceptance Criteria in the rendered page.
The section contains at least one test card for the applicable test types — each describing a real test file to be written for the application (the search-bar plan should include unit tests for the search logic, integration tests for the API, and E2E tests for the user flow).
An objective-verification test hero card is always present, visually distinct, positioned first in the Tests section (before any unit/integration/E2E sub-sections), and describes a real mock/smoke test file that runs against the application and asserts the plan's objective statement is accomplished.
The sidebar navigation includes a "Tests" link that scrolls to the correct section.
Tier 1 verification: Generating a plan for a feature or fix (e.g. "Add a search bar") produces a Tier 1 Tests section with unit/integration/E2E sub-sections populated based on step analysis, plus the mandatory objective-verification test.
Tier 2 verification: Generating a plan for a non-code change (e.g. a docs-only plan or a file-move chore) produces a Tier 2 Tests section with only the mandatory objective-verification test — unit/integration/E2E sub-sections are omitted entirely, not rendered as empty stubs.
Tier boundary test: Generating a type: chore plan whose steps modify application source files (e.g. "Bump dependency and update import paths") correctly selects Tier 1, not Tier 2 — confirming tier is driven by step content, not the type: field.

Separately, create a markdown plan using the updated SKELETON.md and confirm the ## Tests section template renders with all four test-type placeholders.

Completion Checklist

Required

All step TODOs marked as done
All acceptance criteria verified and checked off
Plan status updated to completed

Completion Report

No items to report — all requirements met.

Next Steps

Backfill Tests sections into existing HTML plans

Paste this prompt into Claude to execute this follow-up:

Scan every .html plan file under docs/plans/ in the agentics repo. For each plan that lacks a <section id="tests">, analyze the plan's steps to determine which test types apply (unit, integration, E2E) and generate a Tests section with an objective-verification test derived from the plan's objective. Insert the section between the Steps and Acceptance Criteria sections in the HTML. Report a list of files modified and which test types were added to each.

Add a test-coverage gate to the completion checklist

Paste this prompt into Claude to execute this follow-up:

Update the plan-agent implementation-plan skill's completion checklist (the three mandatory disabled checkboxes in SKELETON.html) to include a fourth checkbox: "All tests in the Tests section have been written and pass." Update the completion-checklist auto-update JavaScript to check this fourth condition by looking for a data attribute or class on the Tests section indicating all test cards are marked as done. Update the SKILL.md to document the new fourth completion requirement.

Add test-runner integration for objective-verification tests

Paste this prompt into Claude to execute this follow-up:

Add a "Run test" button to the objective-verification test card in the HTML plan template. When clicked, it should copy a CLI command (e.g. the test file path with the project's test runner) to the clipboard. Update the SKILL.md to specify that the plan-agent should detect the project's test framework (Jest, Vitest, pytest, etc.) and generate the appropriate run command. The button should be hidden in print and when status is "completed".

Wish List

Auto-generate test file stubs from the Tests section Wish List

Speculative / blue-sky idea — not on the critical path. Paste into Claude when ready to explore:

Explore adding a "Generate test stubs" button to the Tests section in HTML plans. When the user clicks "Implement now" on a plan that has a Tests section, the implementation phase should auto-scaffold test files alongside the source changes — creating one test file per test card with the test description as a pending/todo test case. Investigate how to detect the project's test framework and directory conventions (e.g. __tests__/, *.test.ts, *.spec.ts) to place files correctly. Prototype with Jest and Vitest first.

Visual test-coverage heatmap in the plan sidebar Wish List

Speculative / blue-sky idea — not on the critical path. Paste into Claude when ready to explore:

Design a compact visual indicator in the HTML plan sidebar that shows test coverage density per step — which steps have associated tests and which are untested. Render as a small vertical bar chart or dot grid next to each step in the nav. Use pure CSS (no charting library). When a step has zero associated tests, show a red dot; when it has tests, show a green dot. This gives a glanceable "coverage health" view without leaving the plan page.

Unresolved Questions

~~Should non-code plans (docs, chore) require the full Tests section?~~ Resolved

Decision: Always-present Tests section with a two-tier depth model. Tier 1 (code-touching plans) includes all applicable sub-sections; Tier 2 (non-code plans) requires only the objective-verification test. The author selects the tier based on step content, not the type: field. Rationale: chore plans span a wide spectrum (file moves to code mutations), making a conditionally-present rule unreliable; the objective-verification test is always mandatory so the section is never truly empty; and consistency reduces cognitive load for authors.
~~Where should the objective-verification test appear when tests exist in multiple categories?~~ Resolved

Decision: Always first, as a highlighted hero card. The objective-verification test is the one test you'd write if you could only write one — placing it at the top mirrors that priority. A fixed position eliminates ambiguity: "last" sends the wrong signal (capstone implies "do after everything else" when it should be written first), and "categorized" forces a scope classification ("is this E2E or integration?") that adds friction without value while making the mandatory test harder to find. The layout reads: hero card (the goal) → unit tests (details) → integration tests → E2E tests — mirroring how the plan's objective appears at the top of the document before context and steps.