Implementation Plan

Fix ship-autonomous skill chaining (claude-code#43809)

todo
2026-06-10 agentics fix

Make /git-agent:ship-autonomous immune to claude-code#43809 — chain branch-agent, commit-agent, and pr-agent by Reading their SKILL.md files instead of gated Skill-tool calls, keeping disable-model-invocation: true (the guard against unsolicited git mutations) fully intact.

Implement Read and implement all steps in the plan at docs/plans/fix-ship-autonomous-skill-chaining.html — Make ship-autonomous immune to issue #43809 via Read-based skill chaining
File fix-ship-autonomous-skill-chaining.html
Path docs/plans/fix-ship-autonomous-skill-chaining.html
Acceptance criteria 0 / 9 done

Context

claude-code#43809 reports that disable-model-invocation: true blocks every model-driven Skill-tool invocation — not just unsolicited auto-triggering. Any nested call fails with Error: Skill <name> cannot be used with Skill tool due to disable-model-invocation, even when the user explicitly requested the parent workflow. The issue was filed against Claude Code 2.1.92 and auto-closed as inactive — the behavior was never fixed.

The flag is deliberate in git-agent: commit-agent, pr-agent, branch-agent, and ship stage all changes and mutate git state, so they are command-only — explicit /git-agent:<name> invocation, never fuzzy intent matching. Keeping that guard is a hard requirement of this plan.

The exposure: ship-autonomous chains its pipeline by instructing Skill-tool invocations of three flagged skills at five sites — Step 2 branch-agent (line 89), Step 3 commit-agent (line 99), Step 4 pr-agent (line 109), and the autofix loop's commit references (lines 229 and 238). The user typed /git-agent:ship-autonomous, not the child commands, so every nested call is model-driven and refusable. The plain ship skill and the agent-commit background agent are immune only because they inline duplicate copies of the same workflows — a maintenance tax this plan avoids.

The fix: Read-based chaining. The flag gates only the Skill tool — SKILL.md files are plain markdown on disk, and ${CLAUDE_PLUGIN_ROOT} expands inside skill content at load time (verified empirically this session). Reading keeps a single source of truth. One behavioral consequence: under Skill-tool chaining each child ran under its own allowed-tools; under Read-chaining the parent's list governs — so ship-autonomous must absorb Bash(date *) for branch-agent's auto-naming. (Branch-agent's model: Haiku pin likewise no longer applies — inlined steps run on the session model.)

Files to Modify

agentics/
  • kit/plugins/git-agent/skills/ship-autonomous/SKILL.md modified swap 5 Skill-tool calls to Read chaining; update allowed-tools
  • kit/plugins/git-agent/CHANGELOG.md modified add v3.10.7 entry
  • .claude-plugin/marketplace.json modified bump git-agent 3.10.6 → 3.10.7
  • tests/plugins/test-ship-autonomous-read-chaining.sh new chaining-contract smoke test

Diagram

Chaining mechanism — before vs after
Before — Skill-tool chaining
  • Invoke git-agent:commit-agent via Skill tool
  • Gated: refuses disable-model-invocation skills
  • "cannot be used with Skill tool" (#43809)
  • Child skill's allowed-tools govern the sub-flow
After — Read-based chaining
  • Read ${CLAUDE_PLUGIN_ROOT}/skills/<name>/SKILL.md
  • Ungated: Read is a plain file read
  • Single source of truth — no duplicated logic
  • Parent allowed-tools govern (+ Bash(date *))

Steps

1
todo Swap the three pipeline invocations (Steps 2–4) in kit/plugins/git-agent/skills/ship-autonomous/SKILL.md to Read-based chaining — at the branch-agent (~line 89), commit-agent (~line 99), and pr-agent (~line 109) sites, replace "Invoke the existing git-agent:<name> skill" with: "Read ${CLAUDE_PLUGIN_ROOT}/skills/<name>/SKILL.md and execute its steps inline — do not use the Skill tool. Treat every STOP in the child's content — including its terminal "do not take further action" clause — as ending that sub-workflow only, then continue this pipeline; skip its Step 0 (plan-mode exit is already handled here). If ${CLAUDE_PLUGIN_ROOT} is unexpanded, locate the file with Glob **/git-agent/skills/<name>/SKILL.md (on multiple matches prefer the running plugin's own root; if still ambiguous, STOP and report). If the file cannot be Read at all, STOP the pipeline and report — never improvise the child workflow from memory." Also tighten the PR URL capture (lines ~117–118): take the URL from the gh pr create output, with gh pr view --json url as fallback, instead of "pr-agent's final output". Special-case re-runs: when pr-agent's inlined steps find an existing OPEN PR, capture that PR's URL and continue to Step 5 (CI watch) — its "STOP, do not create a duplicate" is not a pipeline halt.
The Skill tool refuses every model-driven invocation of a disable-model-invocation: true skill (claude-code#43809); the Read tool is ungated and preserves a single source of truth instead of inlining duplicate copies of the child workflows.
Verify
grep -ci "invoke the existing" kit/plugins/git-agent/skills/ship-autonomous/SKILL.md returns 0 — case-insensitive, because line 89's lowercase "invoke" would slip past a capital-I grep; a Read instruction referencing ${CLAUDE_PLUGIN_ROOT}/skills/<name>/SKILL.md exists for each of branch-agent, commit-agent, and pr-agent (three explicit checks, not an aggregate count); the failed-Read STOP rule is present; the PR URL capture references the gh pr create output rather than "pr-agent's final output"; the existing-OPEN-PR continue-to-CI-watch rule is present.
2
todo Swap the two autofix commit references (Steps 6c and 6d, lines ~229 and ~238) to the same pattern — "commit by re-applying the commit-agent workflow (Read ${CLAUDE_PLUGIN_ROOT}/skills/commit-agent/SKILL.md again if its steps are no longer in context)".
Autofix commits hit the same Skill-tool gate on every CI-failure event; pointing back to the already-loaded workflow stays gate-free without a redundant re-read per event.
Verify
grep -n "commit-agent" kit/plugins/git-agent/skills/ship-autonomous/SKILL.md shows only Read-phrased references — no git-agent:commit-agent mention on a line lacking Read or ${CLAUDE_PLUGIN_ROOT} (lines 229/238's "commit via git-agent:commit-agent" phrasing would slip past the Step 1 grep alone); the file-wide case-insensitive count of "invoke the existing" is 0.
3
todo Update the frontmatter and add a regression-guard note — on line 4 remove Skill from allowed-tools and add Bash(date *); below the intro add one note: child skills are command-only (disable-model-invocation: true) — chain them by Reading their SKILL.md, never via the Skill tool (claude-code#43809). The note also records three standing facts: the Bash(glab *) exclusion is intentional (GitHub-only pipeline); branch-agent's model: Haiku pin does not apply under Read-chaining; and any future addition to a child's allowed-tools must be mirrored here.
Under Skill-tool chaining each child ran under its own allowed-tools; under Read-chaining the parent's list governs, and branch-agent's auto-naming runs date. Dropping Skill enforces least privilege; the note stops future edits from regressing. Bash(glab *) is deliberately not added — ship-autonomous is GitHub-only (Step 1 hard-requires gh auth), so pr-agent's GitLab path is unreachable here.
Verify
head -5 kit/plugins/git-agent/skills/ship-autonomous/SKILL.md shows allowed-tools containing Bash(date *) and no Skill token, with all previously listed tools otherwise intact; the guard note appears in the body and names the glab exclusion, the Haiku-pin loss, and the mirror obligation.
4
todo Add the chaining-contract smoke test tests/plugins/test-ship-autonomous-read-chaining.sh, modeled on tests/plugins/test-step8-review-option.sh (bash, set -euo pipefail, numbered PASS/FAIL checks, FAILURES counter, non-zero exit on failure). Assertions: (1) case-insensitive count of "invoke the existing" is 0; (2) no git-agent:(branch|commit|pr)-agent reference on a line lacking Read or ${CLAUDE_PLUGIN_ROOT}; (3) a literal ${CLAUDE_PLUGIN_ROOT}/skills/<name>/SKILL.md Read reference per child — three explicit checks, not an aggregate count; (4) allowed-tools has Bash(date *) and no Skill token; (5) each child still declares disable-model-invocation: true and is unchanged from main (git diff --quiet origin/main -- <child paths>); (6) the failed-Read STOP rule is present; (7) the parent's allowed-tools is a superset of each child's, excluding the intentional Bash(glab *). Note: tests/plugins/ is manual-run by repo convention — not wired into CI.
This is the objective-verification test — it pins the Read-chaining contract so a future edit cannot silently reintroduce gated Skill-tool calls or drop the children's command-only flag.
Verify
bash tests/plugins/test-ship-autonomous-read-chaining.sh prints PASS for all seven assertion groups and exits 0; spot-check the guard by deliberately breaking one contract line (e.g. rewording a Read reference) and confirming the test fails before reverting.
5
todo Bump git-agent from 3.10.6 to 3.10.7 in .claude-plugin/marketplace.json.
Internal fix = PATCH per .claude/rules/marketplace.md; the new value must be higher than main's, and the settings hook auto-validates JSON syntax on save.
Verify
jq -r '.plugins[] | select(.name=="git-agent").version' .claude-plugin/marketplace.json prints 3.10.7; the same query against git show origin/main:.claude-plugin/marketplace.json prints 3.10.6.
6
todo Prepend the CHANGELOG entry — ## v3.10.7 — 2026-06-10 — Read-based skill chaining in ship-autonomous with a ### Fixed list covering the five swapped sites, the allowed-tools change, the guard note, the new smoke test, and a link to claude-code#43809 — to kit/plugins/git-agent/CHANGELOG.md. Include a one-line note that 3.10.6 shipped without a CHANGELOG entry — the v3.10.7 → v3.10.5 lineage gap predates this fix and is acknowledged, not introduced, here.
marketplace.md requires a CHANGELOG entry with every version bump; this entry preserves the rationale for abandoning Skill-tool chaining beyond this session.
Verify
head -15 kit/plugins/git-agent/CHANGELOG.md shows the v3.10.7 entry above v3.10.5, dated 2026-06-10, containing the issue link and the one-line v3.10.6 gap acknowledgement.

Tests

Tier 1 — Code-touching plan
Objective ship-autonomous chains child skills via Read, not the gated Skill tool

File: tests/plugins/test-ship-autonomous-read-chaining.sh

Type: smoke test — static contract assertions, run with bash like the rest of tests/plugins/

Asserts: case-insensitive zero count of "invoke the existing"; no gated git-agent:<child> reference outside a Read line; a literal ${CLAUDE_PLUGIN_ROOT}/skills/<name>/SKILL.md reference per child (three explicit checks); allowed-tools drops Skill, gains Bash(date *), and is a superset of each child's tools (minus the intentional Bash(glab *) exclusion); all three children unchanged from main and still disable-model-invocation: true; the failed-Read STOP rule is present.

Run: bash tests/plugins/test-ship-autonomous-read-chaining.sh

Unit, integration, and E2E suites are not applicable — the change edits declarative skill markdown and plugin metadata with no executable application code. The smoke test above is the executable contract; a live pipeline run is covered in Verification.

Acceptance Criteria

Verification

Run the smoke test and the metadata checks, then (optionally) exercise the live pipeline:

  1. bash tests/plugins/test-ship-autonomous-read-chaining.sh — every check prints PASS, exit code 0.
  2. jq -r '.plugins[] | select(.name=="git-agent").version' .claude-plugin/marketplace.json prints 3.10.7; the same query against git show origin/main:.claude-plugin/marketplace.json prints 3.10.6.
  3. git diff --name-only origin/main lists exactly: the ship-autonomous SKILL.md, the git-agent CHANGELOG, .claude-plugin/marketplace.json, the new test script, this plan file, and the auto-rebuilt docs/plans/index.html.
  4. Live smoke (required — the static test cannot catch runtime failure modes like a STOP-halt mid-pipeline or an unexpanded ${CLAUDE_PLUGIN_ROOT}): run claude --plugin-dir ./kit/plugins/git-agent on a scratch branch with a trivial change, invoke /git-agent:ship-autonomous, and confirm Steps 2–4 Read the child SKILL.md files — no "cannot be used with Skill tool due to disable-model-invocation" error — and the pipeline reaches PR creation without halting at any inlined child STOP. Re-run once on a branch that already has an OPEN PR and confirm the pipeline continues to CI watch instead of halting.

Commit all changed files together — repo convention ships the plan file with the plugin change.

Completion Checklist

Required

Completion Report

No items to report — all requirements met.

Next Steps

Audit other plugins for gated Skill-tool chaining

Paste this prompt into Claude to execute this follow-up:

In the shawn-sandy/agentics repo, audit every skill, command, and agent under kit/plugins/ for instructions that invoke another skill via the Skill tool where the target SKILL.md sets disable-model-invocation: true. Grep for phrases like "Invoke the existing", "Skill(skill:", and the names of flagged skills (commit-agent, pr-agent, branch-agent, ship). Skip docs/plans/archive/. For each exposed site, report plugin, file, and line, then convert it to Read-based chaining via ${CLAUDE_PLUGIN_ROOT}/skills/<name>/SKILL.md following the pattern in kit/plugins/git-agent/skills/ship-autonomous/SKILL.md. Bump each touched plugin's version (patch) in .claude-plugin/marketplace.json and add a CHANGELOG entry per plugin.
Deduplicate the commit workflow between skill and background agent

Paste this prompt into Claude to execute this follow-up:

In kit/plugins/git-agent of the shawn-sandy/agentics repo, the commit workflow is duplicated between skills/commit-agent/SKILL.md and agents/agent-commit.md — the agent mirrors the skill because subagents cannot invoke disable-model-invocation skills (claude-code issue #43809). Extract the shared workflow (guards, git add -A staging, conventional-commit message rules, pre-commit hook failure policy) into one reference file inside the plugin and have both files Read it, keeping their intentional divergences local (Step 0 ExitPlanMode in the skill; the fire-and-forget caveat in the agent). Bump the plugin version (patch) in .claude-plugin/marketplace.json and add a CHANGELOG entry.
Wish List
Revert to Skill-tool chaining if upstream fixes #43809 Wish List

Speculative / blue-sky idea — not on the critical path. Paste into Claude when ready to explore:

Check the latest Claude Code release notes and re-test whether a skill body can invoke a disable-model-invocation: true skill via the Skill tool — the block reported in https://github.com/anthropics/claude-code/issues/43809. Build a minimal repro: a scratch plugin with a flagged child skill and a caller skill that invokes it via the Skill tool; run the caller and observe whether the nested call succeeds. If the gate now inherits user authorization, propose reverting kit/plugins/git-agent/skills/ship-autonomous/SKILL.md in shawn-sandy/agentics to direct Skill-tool chaining and removing the Read-based workaround.
Generated by plan-agent · 2026-06-10 · agentics
Team Review (2026-06-10 22:32:41 UTC) — 5 core reviewers, plan updated in place

Executive Summary

Sound with revisions — now applied. Five core reviewers (architecture, completeness, testability, risk, conventions; no UI signals, so UX/accessibility reviewers were not spawned) found no objection to the Read-based chaining direction; architecture confirmed it strictly improves on the repo's existing duplicate-the-workflow pattern. Two high-severity findings drove the revisions: the plan's own verification grep (capital-I "Invoke the existing") matched only 2 of the 5 sites it guards, and the inlined child skills' emphatic STOP directives (14/5/8 occurrences) risk halting the pipeline mid-run. Both are addressed in the updated plan, along with a newly surfaced re-entrancy bug (pr-agent's existing-OPEN-PR STOP) and a documented CHANGELOG lineage gap.

Role-by-Role Findings

  • Architecture: Mechanism sound and consistent with the codebase; flagged the permanent allowed-tools flattening (parent must carry the union of child tool needs) and recommended a machine-checkable superset assertion in the smoke test, plus recording the Bash(glab *) exclusion as intentional and disambiguating the Glob fallback in multi-checkout layouts. All adopted.
  • Completeness: Steps highly specific; line numbers and version arithmetic verified accurate against the repo. Found the high-severity case-sensitivity blind spot at line 89 (lowercase "invoke"), the unpinned Read anchor string between edit and test, the missing executable check for ac4, and the pre-existing missing v3.10.6 CHANGELOG entry. All adopted.
  • Testability: Confirmed the line-89 and lines-229/238 blind spots live in the file; recommended case-insensitive zero-count, a no-gated-reference-outside-a-Read-line check, per-child positive assertions, and pinning the failed-Read STOP rule. Noted tests/plugins/ is not CI-wired (manual-run convention, now stated in Step 4). All adopted.
  • Risk (level: medium): Highest risk is the model obeying an inlined child's terminal STOP and ending the pipeline; second, pr-agent's existing-OPEN-PR STOP aborting re-runs; third, environment-dependent ${CLAUDE_PLUGIN_ROOT} expansion in skill content. Mitigations adopted: strengthened STOP-override wording, the continue-to-CI-watch rule (new ac9), and the live --plugin-dir smoke promoted from optional to required. Rollback risk assessed low (declarative files only; marketplace merge driver covers version conflicts).
  • Conventions: Version arithmetic, CHANGELOG header format, test naming, and restricted Bash(...) allowed-tools style all match repo practice. Flagged the v3.10.6 changelog gap (now acknowledged in Step 6) and, cosmetically, Step 1's multi-clause density (declined — see Conflicts).

Agreements & Conflicts

Confirmed concerns (multiple reviewers): the case-sensitive grep blind spot (completeness + testability, both high); the v3.10.6 CHANGELOG gap (completeness + risk + conventions); per-child positive Read assertions over an aggregate count (completeness + testability, with architecture's superset check as the stronger variant — both adopted).

Conflict resolved: conventions suggested splitting Step 1 into two cards for one-idea-per-item style; declined to preserve the user-aligned six-step structure — the added safeguards stay consolidated in Step 1, accepted as a readability tradeoff.

Highest-Risk Issues (priority order)

  1. Verification blind spot (high, completeness + testability) — capital-I grep missed 3 of 5 guarded sites. Fixed: case-insensitive counts, per-child literal Read assertions, and a no-gated-reference-outside-a-Read-line check across Step 1/2 verifies, ac1, and the smoke test.
  2. Inlined child STOP halts the pipeline (high, risk) — children contain 14/5/8 STOP tokens. Fixed: Step 1 override now reads "treat every STOP in the child's content — including its terminal clause — as ending that sub-workflow only"; live smoke (now required) is the runtime check.
  3. pr-agent existing-OPEN-PR abort (medium, risk) — re-runs would halt at "do not create a duplicate". Fixed: explicit capture-URL-and-continue-to-CI-watch rule in Step 1 plus new acceptance criterion ac9.
  4. ${CLAUDE_PLUGIN_ROOT} expansion is environment-dependent (medium, risk + architecture) — mitigated by the Glob fallback (now with multi-match disambiguation), the failed-Read STOP rule, and the required live smoke.
  5. allowed-tools superset drift (medium, architecture) — fixed: smoke-test assertion 7 (parent superset of each child, minus intentional glab) and the guard note's mirror obligation.
  6. CHANGELOG v3.10.6 lineage gap (low, three reviewers) — acknowledged with a one-line note in the v3.10.7 entry rather than a fabricated backfill.

Inline Edits Applied

Edits applied to this plan during the team review
TargetActionChange
Step 1 actioneditStrengthened STOP override (every STOP is sub-workflow-local), Glob multi-match disambiguation, existing-OPEN-PR continue-to-CI-watch rule
Step 1 verifyeditCase-insensitive zero-count; per-child Read assertions; OPEN-PR rule check
Step 2 verifyeditNo gated commit-agent mention outside a Read line; case-insensitive count
Step 3 action + verifyeditGuard note now records glab exclusion, Haiku-pin loss, and the mirror obligation
Step 4 action + verifyeditSeven explicit test assertions incl. parent-superset check and children-unchanged git diff; manual-run convention noted
Step 6 action + verifyeditOne-line acknowledgement of the pre-existing v3.10.6 CHANGELOG gap
Objective test cardeditAsserts rewritten to match the seven assertion groups
#criteria-list ac1, ac4editac1 case-insensitive + outside-Read-line clause; ac4 gains executable git diff command
#criteria-listappendNew ac9: OPEN-PR re-run continues to CI watch (progress count 8 → 9)
#verification item 4editLive --plugin-dir smoke promoted from optional to required, plus an OPEN-PR re-run check
#contextappendNoted branch-agent's model: Haiku pin does not apply under Read-chaining