diff --git a/README.md b/README.md index ea17e30e..d7d4d11a 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ Once it's teased a spec out of the conversation, it shows it to you in chunks sh After you've signed off on the design, your agent puts together an implementation plan that's clear enough for an enthusiastic junior engineer with poor taste, no judgement, no project context, and an aversion to testing to follow. It emphasizes true red/green TDD, YAGNI (You Aren't Gonna Need It), and DRY. -Next up, once you say "go", it launches a *subagent-driven-development* process, having agents work through each engineering task, inspecting and reviewing their work, and continuing forward. It's not uncommon for Claude to be able to work autonomously for a couple hours at a time without deviating from the plan you put together. +Next up, once you say "go", it launches a *subagent-driven-development* process, having agents work through each engineering task, inspecting and reviewing their work, and continuing forward. It's not uncommon for your agent to work autonomously for a couple hours at a time without deviating from the plan you put together. There's a bunch more to it, but that's the core of the system. And because the skills trigger automatically, you don't need to do anything special. Your coding agent just has Superpowers. diff --git a/docs/superpowers/specs/2026-05-05-platform-neutral-prose-design.md b/docs/superpowers/specs/2026-05-05-platform-neutral-prose-design.md new file mode 100644 index 00000000..34066b1c --- /dev/null +++ b/docs/superpowers/specs/2026-05-05-platform-neutral-prose-design.md @@ -0,0 +1,94 @@ +# Platform-neutral prose — Phase A design + +## Background + +Superpowers ships to multiple agent runtimes (Claude Code, Codex, Cursor, OpenCode, Copilot CLI, Gemini CLI). Skill content and supporting docs were written first for Claude Code and use "Claude" in places where any runtime's agent applies. OpenAI's vendored fork (openai/plugins#217) attempted a wholesale rewrite that was actively wrong in places — rewriting historical attribution paths, model names, and platform-specific install instructions — and we want to avoid that mistake while still removing platform-centric prose where it is genuinely incidental. + +The full effort is broken into phases by reference category. **This spec covers Phase A only:** generic third-person prose mentioning "Claude" in non-platform-specific contexts. Later phases (config-file references, marketing copy, tool-name references) are out of scope here and will get their own specs. + +## In scope + +Generic prose mentions of "Claude" in: + +- `skills/*/SKILL.md` and supporting `.md` files in active skill directories +- `skills/writing-skills/anthropic-best-practices.md` +- `README.md` (only where the mention is generic prose, not platform marketing) + +Plus one coined-term rename: **Claude Search Optimization (CSO) → Skill Discovery Optimization (SDO)** in `skills/writing-skills/SKILL.md`. + +## Out of scope + +- **Platform/runtime statements** — "In Claude Code:", install instructions, tool-mapping references. (Phase D candidate.) +- **Config-file references** — CLAUDE.md, AGENTS.md, GEMINI.md priority lists and "where to put project conventions" callouts. (Phase B.) +- **Tool-name references** — `Skill`, `Bash`, `Read`, `Task`, `TodoWrite`. Skills are written in Claude Code's tool vocabulary; the existing `references/{codex,copilot,gemini}-tools.md` files map them. (At the time this spec was written, the plan was to defer or skip these. Phase E ended up doing them — replacing tool names with action language across active skills and unifying the platform-tools refs around the same vocabulary.) +- **Marketing copy** in README — "Superpowers for Claude Code", platform-named install sections. (Phase C.) +- **Historical artifacts** — `docs/plans/*.md`, `docs/superpowers/specs/*.md`, `CREATION-LOG.md`. These are dated, point-in-time documents; rewriting them rewrites history. +- **Model identifiers** — Claude Haiku / Sonnet / Opus. These are real product names. +- **Filename / URL references** — `CLAUDE.md`, `claude.com`, `claude-plugin/`, paths under `~/.claude/`. +- **`anthropic-best-practices.md` filename** — the file remains named after its source even though we rewrite the prose inside it. + +## Replacement style + +Use a mix that reads naturally in English: + +- **Second person — "your agent"** when addressing the skill author about *their* runtime + - "your agent reads the description" +- **Third person — "the agent" / "agents" / "an agent"** when describing system behavior generically + - "Future agents find your skills" + - "Use words an agent would search for" + - "Agents read SKILL.md only when the skill becomes relevant" + +Pick whichever fits the surrounding sentence; do not force consistency at the cost of awkward phrasing. Pluralize when natural ("future agents", "agents read") rather than always saying "the agent". + +### Carve-outs that stay as "Claude" + +- Model names: Claude Haiku, Claude Sonnet, Claude Opus +- Filenames and URLs: `CLAUDE.md`, `claude.com`, `~/.claude/` +- Branded platform name "Claude Code" wherever it refers to the runtime as such (handled in later phases) + +### Coined-term rename + +- **Claude Search Optimization (CSO) → Skill Discovery Optimization (SDO)** + - Appears in `skills/writing-skills/SKILL.md` as a section heading and in nearby prose. Rename the heading, the acronym, and any in-file cross-references. + +## Files affected + +Approximate counts based on a `grep` filtered to exclude carve-outs: + +| File | Generic-prose mentions | +|------|------------------------| +| `skills/writing-skills/SKILL.md` | ~12 (includes CSO heading + body) | +| `skills/writing-skills/anthropic-best-practices.md` | ~30 | +| `skills/writing-skills/examples/CLAUDE_MD_TESTING.md` | ~1 — filename stays (it's a CLAUDE.md test artifact); the "Variant C: Claude.AI Emphatic Style" heading also stays (it's a label naming a specific style) | +| `README.md` | ~1 | + +Final list confirmed during implementation by re-running the filtered grep. + +## Commit plan + +Four atomic commits, in order: + +1. **Rename CSO → SDO** in `skills/writing-skills/SKILL.md`. Mechanical, isolated, easy to revert if we change our minds about the term. +2. **Active skills prose** — generic "Claude" → "agent" forms across `skills/*/SKILL.md` and supporting `.md`, excluding `anthropic-best-practices.md`. +3. **`anthropic-best-practices.md` prose** — same substitution rules. Separate commit because this file is a vendored adaptation of an external doc; isolating the change makes future reconciliation with upstream easier to read. +4. **README.md prose** *(only if any generic-prose mentions remain after filtering)*. Skipped if empty. + +Each commit message names the phase ("Phase A") and the slice ("rename CSO to SDO", "agent prose in active skills", etc.) so the series is self-documenting. + +## Verification + +After each commit: + +- `grep -rn "Claude" ` — every remaining hit must fall into a documented carve-out (model name, filename, URL, "Claude Code" platform name, historical artifact). +- Read the touched file end-to-end — substitutions should not have broken sentence flow, pronoun agreement, or list parallelism. +- No tests to run; this is prose-only. + +After the final commit: + +- Skim each modified skill in a live session to confirm nothing reads awkwardly. + +## Non-goals + +- Do not change behavior, structure, headings (other than CSO→SDO), examples, code blocks, or YAML frontmatter. +- Do not introduce new sections, callouts, or compatibility notes. +- Do not "improve" prose beyond the substitution while editing. diff --git a/skills/dispatching-parallel-agents/SKILL.md b/skills/dispatching-parallel-agents/SKILL.md index a6a3f5a0..75e7e22c 100644 --- a/skills/dispatching-parallel-agents/SKILL.md +++ b/skills/dispatching-parallel-agents/SKILL.md @@ -65,14 +65,17 @@ Each agent gets: ### 3. Dispatch in Parallel -```typescript -// In Claude Code / AI environment -Task("Fix agent-tool-abort.test.ts failures") -Task("Fix batch-completion-behavior.test.ts failures") -Task("Fix tool-approval-race-conditions.test.ts failures") -// All three run concurrently +Issue all three subagent dispatches in the same response — they run in parallel: + +```text +Subagent (general-purpose): "Fix agent-tool-abort.test.ts failures" +Subagent (general-purpose): "Fix batch-completion-behavior.test.ts failures" +Subagent (general-purpose): "Fix tool-approval-race-conditions.test.ts failures" +# All three run concurrently. ``` +Multiple dispatch calls in one response = parallel execution. One per response = sequential. + ### 4. Review and Integrate When agents return: diff --git a/skills/writing-skills/SKILL.md b/skills/writing-skills/SKILL.md index c3b73d8b..300da991 100644 --- a/skills/writing-skills/SKILL.md +++ b/skills/writing-skills/SKILL.md @@ -9,7 +9,7 @@ description: Use when creating new skills, editing existing skills, or verifying **Writing skills IS Test-Driven Development applied to process documentation.** -**Personal skills live in agent-specific directories (`~/.claude/skills` for Claude Code, `~/.agents/skills/` for Codex)** +**Personal skills live in your runtime's skills directory** — see `../using-superpowers/references/-tools.md` (where `` is `claude-code`, `codex`, `copilot`, or `gemini`) for the path on your runtime. Codex, Copilot CLI, and Gemini CLI all also recognize `~/.agents/skills/` as a cross-runtime alias. You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes). @@ -21,7 +21,7 @@ You write test cases (pressure scenarios with subagents), watch them fail (basel ## What is a Skill? -A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches. +A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future agents find and apply effective approaches. **Skills are:** Reusable techniques, patterns, tools, reference guides @@ -55,7 +55,7 @@ The entire skill creation process follows RED-GREEN-REFACTOR. **Don't create for:** - One-off solutions - Standard practices well-documented elsewhere -- Project-specific conventions (put in CLAUDE.md) +- Project-specific conventions (put in your instructions file) - Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls) ## Skill Types @@ -99,7 +99,7 @@ skills/ - `description`: Third-person, describes ONLY when to use (NOT what it does) - Start with "Use when..." to focus on triggering conditions - Include specific symptoms, situations, and contexts - - **NEVER summarize the skill's process or workflow** (see CSO section for why) + - **NEVER summarize the skill's process or workflow** (see SDO section for why) - Keep under 500 characters if possible ```markdown @@ -137,13 +137,13 @@ Concrete results ``` -## Claude Search Optimization (CSO) +## Skill Discovery Optimization (SDO) -**Critical for discovery:** Future Claude needs to FIND your skill +**Critical for discovery:** Future agents need to FIND your skill ### 1. Rich Description Field -**Purpose:** Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?" +**Purpose:** Your agent reads the description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?" **Format:** Start with "Use when..." to focus on triggering conditions @@ -151,14 +151,14 @@ Concrete results The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description. -**Why this matters:** Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality). +**Why this matters:** Testing revealed that when a description summarizes the skill's workflow, an agent may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused an agent to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality). -When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process. +When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), the agent correctly read the flowchart and followed the two-stage review process. -**The trap:** Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips. +**The trap:** Descriptions that summarize workflow create a shortcut agents will take. The skill body becomes documentation agents skip. ```yaml -# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill +# ❌ BAD: Summarizes workflow - agents may follow this instead of reading skill description: Use when executing plans - dispatches subagent per task with code review between tasks # ❌ BAD: Too much process detail @@ -198,7 +198,7 @@ description: Use when using React Router and handling authentication redirects ### 2. Keyword Coverage -Use words Claude would search for: +Use words an agent would search for: - Error messages: "Hook timed out", "ENOTEMPTY", "race condition" - Symptoms: "flaky", "hanging", "zombie", "pollution" - Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach" @@ -275,7 +275,7 @@ wc -w skills/path/SKILL.md - `creating-skills`, `testing-skills`, `debugging-with-logs` - Active, describes the action you're taking -### 4. Cross-Referencing Other Skills +### 5. Cross-Referencing Other Skills **When writing documentation that references other skills:** @@ -313,7 +313,7 @@ digraph when_flowchart { - Linear instructions → Numbered lists - Labels without semantic meaning (step1, helper2) -See @graphviz-conventions.dot for graphviz style rules. +See `graphviz-conventions.dot` in this directory for graphviz style rules. **Visualizing for your human partner:** Use `render-graphs.js` in this directory to render a skill's flowcharts to SVG: ```bash @@ -522,7 +522,7 @@ Make it easy for agents to self-check when rationalizing: **All of these mean: Delete code. Start over with TDD.** ``` -### Update CSO for Violation Symptoms +### Update SDO for Violation Symptoms Add to description: symptoms of when you're ABOUT to violate the rule: @@ -595,7 +595,7 @@ Deploying untested skills = deploying untested code. It's a violation of quality ## Skill Creation Checklist (TDD Adapted) -**IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.** +**IMPORTANT: Create a todo for EACH checklist item below.** **RED Phase - Write Failing Test:** - [ ] Create pressure scenarios (3+ combined pressures for discipline skills) @@ -634,9 +634,10 @@ Deploying untested skills = deploying untested code. It's a violation of quality ## Discovery Workflow -How future Claude finds your skill: +How future agents find your skill: 1. **Encounters problem** ("tests are flaky") +2. **Searches skills** (greps descriptions, browses categories) 3. **Finds SKILL** (description matches) 4. **Scans overview** (is this relevant?) 5. **Reads patterns** (quick reference table) diff --git a/skills/writing-skills/anthropic-best-practices.md b/skills/writing-skills/anthropic-best-practices.md index 9f3f6ecf..f767f9fe 100644 --- a/skills/writing-skills/anthropic-best-practices.md +++ b/skills/writing-skills/anthropic-best-practices.md @@ -1,8 +1,8 @@ # Skill authoring best practices -> Learn how to write effective Skills that Claude can discover and use successfully. +> Learn how to write effective Skills that agents can discover and use successfully. -Good Skills are concise, well-structured, and tested with real usage. This guide provides practical authoring decisions to help you write Skills that Claude can discover and use effectively. +Good Skills are concise, well-structured, and tested with real usage. This guide provides practical authoring decisions to help you write Skills that agents can discover and use effectively. For conceptual background on how Skills work, see the [Skills overview](/en/docs/agents-and-tools/agent-skills/overview). @@ -10,21 +10,21 @@ For conceptual background on how Skills work, see the [Skills overview](/en/docs ### Concise is key -The [context window](https://platform.claude.com/docs/en/build-with-claude/context-windows) is a public good. Your Skill shares the context window with everything else Claude needs to know, including: +The [context window](https://platform.claude.com/docs/en/build-with-claude/context-windows) is a public good. Your Skill shares the context window with everything else your agent needs to know, including: * The system prompt * Conversation history * Other Skills' metadata * Your actual request -Not every token in your Skill has an immediate cost. At startup, only the metadata (name and description) from all Skills is pre-loaded. Claude reads SKILL.md only when the Skill becomes relevant, and reads additional files only as needed. However, being concise in SKILL.md still matters: once Claude loads it, every token competes with conversation history and other context. +Not every token in your Skill has an immediate cost. At startup, only the metadata (name and description) from all Skills is pre-loaded. Agents read SKILL.md only when the Skill becomes relevant, and read additional files only as needed. However, being concise in SKILL.md still matters: once an agent loads it, every token competes with conversation history and other context. -**Default assumption**: Claude is already very smart +**Default assumption**: Agents are already very smart -Only add context Claude doesn't already have. Challenge each piece of information: +Only add context agents don't already have. Challenge each piece of information: -* "Does Claude really need this explanation?" -* "Can I assume Claude knows this?" +* "Does the agent really need this explanation?" +* "Can I assume the agent knows this?" * "Does this paragraph justify its token cost?" **Good example: Concise** (approximately 50 tokens): @@ -54,7 +54,7 @@ recommend pdfplumber because it's easy to use and handles most cases well. First, you'll need to install it using pip. Then you can use the code below... ``` -The concise version assumes Claude knows what PDFs are and how libraries work. +The concise version assumes the agent knows what PDFs are and how libraries work. ### Set appropriate degrees of freedom @@ -124,10 +124,10 @@ python scripts/migrate.py --verify --backup Do not modify the command or add additional flags. ```` -**Analogy**: Think of Claude as a robot exploring a path: +**Analogy**: Think of the agent as a robot exploring a path: * **Narrow bridge with cliffs on both sides**: There's only one safe way forward. Provide specific guardrails and exact instructions (low freedom). Example: database migrations that must run in exact sequence. -* **Open field with no hazards**: Many paths lead to success. Give general direction and trust Claude to find the best route (high freedom). Example: code reviews where context determines the best approach. +* **Open field with no hazards**: Many paths lead to success. Give general direction and trust the agent to find the best route (high freedom). Example: code reviews where context determines the best approach. ### Test with all models you plan to use @@ -196,7 +196,7 @@ The `description` field enables Skill discovery and should include both what the **Be specific and include key terms**. Include both what the Skill does and specific triggers/contexts for when to use it. -Each Skill has exactly one description field. The description is critical for skill selection: Claude uses it to choose the right Skill from potentially 100+ available Skills. Your description must provide enough detail for Claude to know when to select this Skill, while the rest of SKILL.md provides the implementation details. +Each Skill has exactly one description field. The description is critical for skill selection: agents use it to choose the right Skill from potentially 100+ available Skills. Your description must provide enough detail for an agent to know when to select this Skill, while the rest of SKILL.md provides the implementation details. Effective examples: @@ -234,7 +234,7 @@ description: Does stuff with files ### Progressive disclosure patterns -SKILL.md serves as an overview that points Claude to detailed materials as needed, like a table of contents in an onboarding guide. For an explanation of how progressive disclosure works, see [How Skills work](/en/docs/agents-and-tools/agent-skills/overview#how-skills-work) in the overview. +SKILL.md serves as an overview that points agents to detailed materials as needed, like a table of contents in an onboarding guide. For an explanation of how progressive disclosure works, see [How Skills work](/en/docs/agents-and-tools/agent-skills/overview#how-skills-work) in the overview. **Practical guidance:** @@ -248,7 +248,7 @@ A basic Skill starts with just a SKILL.md file containing metadata and instructi Simple SKILL.md file showing YAML frontmatter and markdown body -As your Skill grows, you can bundle additional content that Claude loads only when needed: +As your Skill grows, you can bundle additional content that agents load only when needed: Bundling additional reference files like reference.md and forms.md. @@ -292,11 +292,11 @@ with pdfplumber.open("file.pdf") as pdf: **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns ```` -Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed. +Agents load FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed. #### Pattern 2: Domain-specific organization -For Skills with multiple domains, organize content by domain to avoid loading irrelevant context. When a user asks about sales metrics, Claude only needs to read sales-related schemas, not finance or marketing data. This keeps token usage low and context focused. +For Skills with multiple domains, organize content by domain to avoid loading irrelevant context. When a user asks about sales metrics, the agent only needs to read sales-related schemas, not finance or marketing data. This keeps token usage low and context focused. ``` bigquery-skill/ @@ -348,13 +348,13 @@ For simple edits, modify the XML directly. **For OOXML details**: See [OOXML.md](OOXML.md) ``` -Claude reads REDLINING.md or OOXML.md only when the user needs those features. +Agents read REDLINING.md or OOXML.md only when the user needs those features. ### Avoid deeply nested references -Claude may partially read files when they're referenced from other referenced files. When encountering nested references, Claude might use commands like `head -100` to preview content rather than reading entire files, resulting in incomplete information. +Agents may partially read files when they're referenced from other referenced files. When encountering nested references, an agent might use commands like `head -100` to preview content rather than reading entire files, resulting in incomplete information. -**Keep references one level deep from SKILL.md**. All reference files should link directly from SKILL.md to ensure Claude reads complete files when needed. +**Keep references one level deep from SKILL.md**. All reference files should link directly from SKILL.md to ensure agents read complete files when needed. **Bad example: Too deep**: @@ -382,7 +382,7 @@ Here's the actual information... ### Structure longer reference files with table of contents -For reference files longer than 100 lines, include a table of contents at the top. This ensures Claude can see the full scope of available information even when previewing with partial reads. +For reference files longer than 100 lines, include a table of contents at the top. This ensures agents can see the full scope of available information even when previewing with partial reads. **Example**: @@ -403,7 +403,7 @@ For reference files longer than 100 lines, include a table of contents at the to ... ``` -Claude can then read the complete file or jump to specific sections as needed. +Agents can then read the complete file or jump to specific sections as needed. For details on how this filesystem-based architecture enables progressive disclosure, see the [Runtime environment](#runtime-environment) section in the Advanced section below. @@ -411,7 +411,7 @@ For details on how this filesystem-based architecture enables progressive disclo ### Use workflows for complex tasks -Break complex operations into clear, sequential steps. For particularly complex workflows, provide a checklist that Claude can copy into its response and check off as it progresses. +Break complex operations into clear, sequential steps. For particularly complex workflows, provide a checklist that the agent can copy into its response and check off as it progresses. **Example 1: Research synthesis workflow** (for Skills without code): @@ -498,7 +498,7 @@ Run: `python scripts/verify_output.py output.pdf` If verification fails, return to Step 2. ```` -Clear steps prevent Claude from skipping critical validation. The checklist helps both Claude and you track progress through multi-step workflows. +Clear steps prevent agents from skipping critical validation. The checklist helps both you and the agent track progress through multi-step workflows. ### Implement feedback loops @@ -524,7 +524,7 @@ This pattern greatly improves output quality. 5. Finalize and save the document ``` -This shows the validation loop pattern using reference documents instead of scripts. The "validator" is STYLE\_GUIDE.md, and Claude performs the check by reading and comparing. +This shows the validation loop pattern using reference documents instead of scripts. The "validator" is STYLE\_GUIDE.md, and the agent performs the check by reading and comparing. **Example 2: Document editing process** (for Skills with code): @@ -593,7 +593,7 @@ Choose one term and use it throughout the Skill: * Mix "field", "box", "element", "control" * Mix "extract", "pull", "get", "retrieve" -Consistency helps Claude understand and follow instructions. +Consistency helps agents understand and follow instructions. ## Common patterns @@ -688,11 +688,11 @@ chore: update dependencies and refactor error handling Follow this style: type(scope): brief description, then detailed explanation. ```` -Examples help Claude understand the desired style and level of detail more clearly than descriptions alone. +Examples help agents understand the desired style and level of detail more clearly than descriptions alone. ### Conditional workflow pattern -Guide Claude through decision points: +Guide agents through decision points: ```markdown theme={null} ## Document modification workflow @@ -715,7 +715,7 @@ Guide Claude through decision points: ``` - If workflows become large or complicated with many steps, consider pushing them into separate files and tell Claude to read the appropriate file based on the task at hand. + If workflows become large or complicated with many steps, consider pushing them into separate files and tell the agent to read the appropriate file based on the task at hand. ## Evaluation and iteration @@ -726,9 +726,9 @@ Guide Claude through decision points: **Evaluation-driven development:** -1. **Identify gaps**: Run Claude on representative tasks without a Skill. Document specific failures or missing context +1. **Identify gaps**: Run your agent on representative tasks without a Skill. Document specific failures or missing context 2. **Create evaluations**: Build three scenarios that test these gaps -3. **Establish baseline**: Measure Claude's performance without the Skill +3. **Establish baseline**: Measure the agent's performance without the Skill 4. **Write minimal instructions**: Create just enough content to address the gaps and pass evaluations 5. **Iterate**: Execute evaluations, compare against baseline, and refine @@ -753,51 +753,51 @@ This approach ensures you're solving actual problems rather than anticipating re This example demonstrates a data-driven evaluation with a simple testing rubric. We do not currently provide a built-in way to run these evaluations. Users can create their own evaluation system. Evaluations are your source of truth for measuring Skill effectiveness. -### Develop Skills iteratively with Claude +### Develop Skills iteratively with the agent -The most effective Skill development process involves Claude itself. Work with one instance of Claude ("Claude A") to create a Skill that will be used by other instances ("Claude B"). Claude A helps you design and refine instructions, while Claude B tests them in real tasks. This works because Claude models understand both how to write effective agent instructions and what information agents need. +The most effective Skill development process involves the agent itself. Work with one instance ("Agent A") to create a Skill that will be used by other instances ("Agent B"). Agent A helps you design and refine instructions, while Agent B tests them in real tasks. This works because the underlying models understand both how to write effective agent instructions and what information agents need. **Creating a new Skill:** -1. **Complete a task without a Skill**: Work through a problem with Claude A using normal prompting. As you work, you'll naturally provide context, explain preferences, and share procedural knowledge. Notice what information you repeatedly provide. +1. **Complete a task without a Skill**: Work through a problem with Agent A using normal prompting. As you work, you'll naturally provide context, explain preferences, and share procedural knowledge. Notice what information you repeatedly provide. 2. **Identify the reusable pattern**: After completing the task, identify what context you provided that would be useful for similar future tasks. **Example**: If you worked through a BigQuery analysis, you might have provided table names, field definitions, filtering rules (like "always exclude test accounts"), and common query patterns. -3. **Ask Claude A to create a Skill**: "Create a Skill that captures this BigQuery analysis pattern we just used. Include the table schemas, naming conventions, and the rule about filtering test accounts." +3. **Ask Agent A to create a Skill**: "Create a Skill that captures this BigQuery analysis pattern we just used. Include the table schemas, naming conventions, and the rule about filtering test accounts." - Claude models understand the Skill format and structure natively. You don't need special system prompts or a "writing skills" skill to get Claude to help create Skills. Simply ask Claude to create a Skill and it will generate properly structured SKILL.md content with appropriate frontmatter and body content. + Modern agents understand the Skill format and structure natively. You don't need special system prompts or a "writing skills" skill to get help creating Skills. Simply ask the agent to create a Skill and it will generate properly structured SKILL.md content with appropriate frontmatter and body content. -4. **Review for conciseness**: Check that Claude A hasn't added unnecessary explanations. Ask: "Remove the explanation about what win rate means - Claude already knows that." +4. **Review for conciseness**: Check that Agent A hasn't added unnecessary explanations. Ask: "Remove the explanation about what win rate means - the agent already knows that." -5. **Improve information architecture**: Ask Claude A to organize the content more effectively. For example: "Organize this so the table schema is in a separate reference file. We might add more tables later." +5. **Improve information architecture**: Ask Agent A to organize the content more effectively. For example: "Organize this so the table schema is in a separate reference file. We might add more tables later." -6. **Test on similar tasks**: Use the Skill with Claude B (a fresh instance with the Skill loaded) on related use cases. Observe whether Claude B finds the right information, applies rules correctly, and handles the task successfully. +6. **Test on similar tasks**: Use the Skill with Agent B (a fresh instance with the Skill loaded) on related use cases. Observe whether Agent B finds the right information, applies rules correctly, and handles the task successfully. -7. **Iterate based on observation**: If Claude B struggles or misses something, return to Claude A with specifics: "When Claude used this Skill, it forgot to filter by date for Q4. Should we add a section about date filtering patterns?" +7. **Iterate based on observation**: If Agent B struggles or misses something, return to Agent A with specifics: "When the agent used this Skill, it forgot to filter by date for Q4. Should we add a section about date filtering patterns?" **Iterating on existing Skills:** The same hierarchical pattern continues when improving Skills. You alternate between: -* **Working with Claude A** (the expert who helps refine the Skill) -* **Testing with Claude B** (the agent using the Skill to perform real work) -* **Observing Claude B's behavior** and bringing insights back to Claude A +* **Working with Agent A** (the expert who helps refine the Skill) +* **Testing with Agent B** (the agent using the Skill to perform real work) +* **Observing Agent B's behavior** and bringing insights back to Agent A -1. **Use the Skill in real workflows**: Give Claude B (with the Skill loaded) actual tasks, not test scenarios +1. **Use the Skill in real workflows**: Give Agent B (with the Skill loaded) actual tasks, not test scenarios -2. **Observe Claude B's behavior**: Note where it struggles, succeeds, or makes unexpected choices +2. **Observe Agent B's behavior**: Note where it struggles, succeeds, or makes unexpected choices - **Example observation**: "When I asked Claude B for a regional sales report, it wrote the query but forgot to filter out test accounts, even though the Skill mentions this rule." + **Example observation**: "When I asked Agent B for a regional sales report, it wrote the query but forgot to filter out test accounts, even though the Skill mentions this rule." -3. **Return to Claude A for improvements**: Share the current SKILL.md and describe what you observed. Ask: "I noticed Claude B forgot to filter test accounts when I asked for a regional report. The Skill mentions filtering, but maybe it's not prominent enough?" +3. **Return to Agent A for improvements**: Share the current SKILL.md and describe what you observed. Ask: "I noticed Agent B forgot to filter test accounts when I asked for a regional report. The Skill mentions filtering, but maybe it's not prominent enough?" -4. **Review Claude A's suggestions**: Claude A might suggest reorganizing to make rules more prominent, using stronger language like "MUST filter" instead of "always filter", or restructuring the workflow section. +4. **Review Agent A's suggestions**: Agent A might suggest reorganizing to make rules more prominent, using stronger language like "MUST filter" instead of "always filter", or restructuring the workflow section. -5. **Apply and test changes**: Update the Skill with Claude A's refinements, then test again with Claude B on similar requests +5. **Apply and test changes**: Update the Skill with Agent A's refinements, then test again with Agent B on similar requests 6. **Repeat based on usage**: Continue this observe-refine-test cycle as you encounter new scenarios. Each iteration improves the Skill based on real agent behavior, not assumptions. @@ -807,18 +807,18 @@ The same hierarchical pattern continues when improving Skills. You alternate bet 2. Ask: Does the Skill activate when expected? Are instructions clear? What's missing? 3. Incorporate feedback to address blind spots in your own usage patterns -**Why this approach works**: Claude A understands agent needs, you provide domain expertise, Claude B reveals gaps through real usage, and iterative refinement improves Skills based on observed behavior rather than assumptions. +**Why this approach works**: Agent A understands agent needs, you provide domain expertise, Agent B reveals gaps through real usage, and iterative refinement improves Skills based on observed behavior rather than assumptions. -### Observe how Claude navigates Skills +### Observe how agents navigate Skills -As you iterate on Skills, pay attention to how Claude actually uses them in practice. Watch for: +As you iterate on Skills, pay attention to how agents actually use them in practice. Watch for: -* **Unexpected exploration paths**: Does Claude read files in an order you didn't anticipate? This might indicate your structure isn't as intuitive as you thought -* **Missed connections**: Does Claude fail to follow references to important files? Your links might need to be more explicit or prominent -* **Overreliance on certain sections**: If Claude repeatedly reads the same file, consider whether that content should be in the main SKILL.md instead -* **Ignored content**: If Claude never accesses a bundled file, it might be unnecessary or poorly signaled in the main instructions +* **Unexpected exploration paths**: Does the agent read files in an order you didn't anticipate? This might indicate your structure isn't as intuitive as you thought +* **Missed connections**: Does the agent fail to follow references to important files? Your links might need to be more explicit or prominent +* **Overreliance on certain sections**: If the agent repeatedly reads the same file, consider whether that content should be in the main SKILL.md instead +* **Ignored content**: If the agent never accesses a bundled file, it might be unnecessary or poorly signaled in the main instructions -Iterate based on these observations rather than assumptions. The 'name' and 'description' in your Skill's metadata are particularly critical. Claude uses these when deciding whether to trigger the Skill in response to the current task. Make sure they clearly describe what the Skill does and when it should be used. +Iterate based on these observations rather than assumptions. The 'name' and 'description' in your Skill's metadata are particularly critical. Agents use these when deciding whether to trigger the Skill in response to the current task. Make sure they clearly describe what the Skill does and when it should be used. ## Anti-patterns to avoid @@ -854,7 +854,7 @@ The sections below focus on Skills that include executable scripts. If your Skil ### Solve, don't punt -When writing scripts for Skills, handle error conditions rather than punting to Claude. +When writing scripts for Skills, handle error conditions rather than punting to the agent. **Good example: Handle errors explicitly**: @@ -876,15 +876,15 @@ def process_file(path): return '' ``` -**Bad example: Punt to Claude**: +**Bad example: Punt to the agent**: ```python theme={null} def process_file(path): - # Just fail and let Claude figure it out + # Just fail and let the agent figure it out return open(path).read() ``` -Configuration parameters should also be justified and documented to avoid "voodoo constants" (Ousterhout's law). If you don't know the right value, how will Claude determine it? +Configuration parameters should also be justified and documented to avoid "voodoo constants" (Ousterhout's law). If you don't know the right value, how will the agent determine it? **Good example: Self-documenting**: @@ -907,7 +907,7 @@ RETRIES = 5 # Why 5? ### Provide utility scripts -Even if Claude could write a script, pre-made scripts offer advantages: +Even if your agent could write a script, pre-made scripts offer advantages: **Benefits of utility scripts**: @@ -918,9 +918,9 @@ Even if Claude could write a script, pre-made scripts offer advantages: Bundling executable scripts alongside instruction files -The diagram above shows how executable scripts work alongside instruction files. The instruction file (forms.md) references the script, and Claude can execute it without loading its contents into context. +The diagram above shows how executable scripts work alongside instruction files. The instruction file (forms.md) references the script, and the agent can execute it without loading its contents into context. -**Important distinction**: Make clear in your instructions whether Claude should: +**Important distinction**: Make clear in your instructions whether the agent should: * **Execute the script** (most common): "Run `analyze_form.py` to extract fields" * **Read it as reference** (for complex logic): "See `analyze_form.py` for the field extraction algorithm" @@ -962,7 +962,7 @@ python scripts/fill_form.py input.pdf fields.json output.pdf ### Use visual analysis -When inputs can be rendered as images, have Claude analyze them: +When inputs can be rendered as images, have the agent analyze them: ````markdown theme={null} ## Form layout analysis @@ -973,20 +973,20 @@ When inputs can be rendered as images, have Claude analyze them: ``` 2. Analyze each page image to identify form fields -3. Claude can see field locations and types visually +3. The agent can see field locations and types visually ```` In this example, you'd need to write the `pdf_to_images.py` script. -Claude's vision capabilities help understand layouts and structures. +Agent vision capabilities help understand layouts and structures. ### Create verifiable intermediate outputs -When Claude performs complex, open-ended tasks, it can make mistakes. The "plan-validate-execute" pattern catches errors early by having Claude first create a plan in a structured format, then validate that plan with a script before executing it. +When agents perform complex, open-ended tasks, they can make mistakes. The "plan-validate-execute" pattern catches errors early by having the agent first create a plan in a structured format, then validate that plan with a script before executing it. -**Example**: Imagine asking Claude to update 50 form fields in a PDF based on a spreadsheet. Without validation, Claude might reference non-existent fields, create conflicting values, miss required fields, or apply updates incorrectly. +**Example**: Imagine asking the agent to update 50 form fields in a PDF based on a spreadsheet. Without validation, it might reference non-existent fields, create conflicting values, miss required fields, or apply updates incorrectly. **Solution**: Use the workflow pattern shown above (PDF form filling), but add an intermediate `changes.json` file that gets validated before applying changes. The workflow becomes: analyze → **create plan file** → **validate plan** → execute → verify. @@ -994,12 +994,12 @@ When Claude performs complex, open-ended tasks, it can make mistakes. The "plan- * **Catches errors early**: Validation finds problems before changes are applied * **Machine-verifiable**: Scripts provide objective verification -* **Reversible planning**: Claude can iterate on the plan without touching originals +* **Reversible planning**: The agent can iterate on the plan without touching originals * **Clear debugging**: Error messages point to specific problems **When to use**: Batch operations, destructive changes, complex validation rules, high-stakes operations. -**Implementation tip**: Make validation scripts verbose with specific error messages like "Field 'signature\_date' not found. Available fields: customer\_name, order\_total, signature\_date\_signed" to help Claude fix issues. +**Implementation tip**: Make validation scripts verbose with specific error messages like "Field 'signature\_date' not found. Available fields: customer\_name, order\_total, signature\_date\_signed" to help the agent fix issues. ### Package dependencies @@ -1016,24 +1016,24 @@ Skills run in a code execution environment with filesystem access, bash commands **How this affects your authoring:** -**How Claude accesses Skills:** +**How agents access Skills:** 1. **Metadata pre-loaded**: At startup, the name and description from all Skills' YAML frontmatter are loaded into the system prompt -2. **Files read on-demand**: Claude uses bash Read tools to access SKILL.md and other files from the filesystem when needed +2. **Files read on-demand**: Agents use their file-reading tools to access SKILL.md and other files from the filesystem when needed 3. **Scripts executed efficiently**: Utility scripts can be executed via bash without loading their full contents into context. Only the script's output consumes tokens 4. **No context penalty for large files**: Reference files, data, or documentation don't consume context tokens until actually read -* **File paths matter**: Claude navigates your skill directory like a filesystem. Use forward slashes (`reference/guide.md`), not backslashes +* **File paths matter**: Agents navigate your skill directory like a filesystem. Use forward slashes (`reference/guide.md`), not backslashes * **Name files descriptively**: Use names that indicate content: `form_validation_rules.md`, not `doc2.md` * **Organize for discovery**: Structure directories by domain or feature * Good: `reference/finance.md`, `reference/sales.md` * Bad: `docs/file1.md`, `docs/file2.md` * **Bundle comprehensive resources**: Include complete API docs, extensive examples, large datasets; no context penalty until accessed -* **Prefer scripts for deterministic operations**: Write `validate_form.py` rather than asking Claude to generate validation code +* **Prefer scripts for deterministic operations**: Write `validate_form.py` rather than asking the agent to generate validation code * **Make execution intent clear**: * "Run `analyze_form.py` to extract fields" (execute) * "See `analyze_form.py` for the extraction algorithm" (read as reference) -* **Test file access patterns**: Verify Claude can navigate your directory structure by testing with real requests +* **Test file access patterns**: Verify the agent can navigate your directory structure by testing with real requests **Example:** @@ -1046,7 +1046,7 @@ bigquery-skill/ └── product.md (usage analytics) ``` -When the user asks about revenue, Claude reads SKILL.md, sees the reference to `reference/finance.md`, and invokes bash to read just that file. The sales.md and product.md files remain on the filesystem, consuming zero context tokens until needed. This filesystem-based model is what enables progressive disclosure. Claude can navigate and selectively load exactly what each task requires. +When the user asks about revenue, the agent reads SKILL.md, sees the reference to `reference/finance.md`, and invokes bash to read just that file. The sales.md and product.md files remain on the filesystem, consuming zero context tokens until needed. This filesystem-based model is what enables progressive disclosure. Agents can navigate and selectively load exactly what each task requires. For complete details on the technical architecture, see [How Skills work](/en/docs/agents-and-tools/agent-skills/overview#how-skills-work) in the Skills overview. @@ -1068,7 +1068,7 @@ Where: * `BigQuery` and `GitHub` are MCP server names * `bigquery_schema` and `create_issue` are the tool names within those servers -Without the server prefix, Claude may fail to locate the tool, especially when multiple MCP servers are available. +Without the server prefix, agents may fail to locate the tool, especially when multiple MCP servers are available. ### Avoid assuming tools are installed @@ -1117,7 +1117,7 @@ Before sharing a Skill, verify: ### Code and scripts -* [ ] Scripts solve problems rather than punt to Claude +* [ ] Scripts solve problems rather than punt to the agent * [ ] Error handling is explicit and helpful * [ ] No "voodoo constants" (all values justified) * [ ] Required packages listed in instructions and verified as available