AI Agents SKILL.md Claude Code March 2026 · Andy

How AI Agents Learn New Skills (SKILL.md Explained)

AI models can't be retrained every time you want them to do something new. But they can read instructions. The SKILL.md file format is a structured way to give an AI agent a new capability — a defined behavior it can activate on demand, with reliable outputs, without touching the base model or the main system prompt.

In this article

What SKILL.md is
How skills trigger
Anatomy of a SKILL.md file
A complete example
Best practices
When to use skills vs system prompt

What SKILL.md Is

A SKILL.md file is a markdown document that teaches an AI agent a specific, reusable capability. Where a system prompt (or CLAUDE.md) defines the agent's general identity and behavior, a SKILL.md defines one skill in precise detail — the trigger conditions that activate it, the exact output format it produces, and the constraints it operates under.

The core insight is that language models learn from context. If you put a well-structured skill definition into an agent's context at the right moment, the agent will follow it with high consistency — especially when the skill includes examples. The SKILL.md format is a standardized way to structure those definitions so they're reusable, auditable, and improvable over time.

In Claude Code, skills are loaded automatically when placed in the right directory. In other agent frameworks, they're loaded into context at invocation time or registered in the agent's skill library for on-demand retrieval.

What skills are not

A skill is not a plugin, an API call, or a function. It doesn't add new tools or external capabilities to the agent. It adds a defined pattern of behavior — a reliable way the agent responds to a specific class of request. The agent still works within its existing capabilities; the skill tells it exactly how to apply those capabilities to this type of task.

How Skills Trigger

Every SKILL.md file defines trigger conditions — the circumstances under which the skill should activate. When the agent receives a request, it matches the request against its loaded skill triggers. If a trigger matches, the skill's output template and constraints take precedence over default behavior.

Trigger conditions come in three forms:

Trigger type 1

Explicit invocation

The user or system explicitly calls the skill by name. "Run the code review skill on this file." This is the simplest trigger to implement — the skill activates when its name appears in the request. Reliable but requires the caller to know the skill exists.

Trigger type 2

Intent-based trigger

The agent recognizes that the user's intent matches the skill's domain. "Review this Python file for security issues" triggers a code review skill even without the word "skill." The skill's trigger conditions define what intent patterns should activate it — and crucially, which similar-looking patterns should not.

Trigger type 3

Context-based trigger

The skill activates based on the content or structure of the input, not the user's words. A data analysis skill might trigger when the input contains a CSV table, regardless of how the request is phrased. Context triggers are powerful but require careful negative conditions to avoid false activations.

The critical importance of negative conditions

Every trigger definition needs explicit negative conditions — the cases that look like the skill's domain but aren't. Without them, the skill activates on requests it wasn't designed for and produces wrong outputs. A research skill without negative conditions will fire on simple factual questions it should answer from memory, not from a structured research workflow.

Anatomy of a SKILL.md File

A complete SKILL.md file has six sections. Omitting any of them degrades reliability:

---
name: [Skill Name]
description: [One sentence on what this skill does]
version: 1.0
---

## Trigger Conditions

ACTIVATE WHEN:
- [Condition 1 — specific and behavioral]
- [Condition 2]

DO NOT ACTIVATE WHEN:
- [Negative condition 1]
- [Negative condition 2]

TRIGGER EXAMPLES:
- "[Example prompt that should activate this skill]"
- "[Another example]"

NOT TRIGGER EXAMPLES:
- "[Example that looks similar but should not trigger]"

## Output Format

[Describe the exact structure of the output. Use headers, sections, or templates
as appropriate. This section is the spec — the model uses it as a direct template.]

## Iron Laws

1. NEVER [specific failure mode] — instead [recovery path]
2. NEVER [specific failure mode] — instead [recovery path]

## Error Handling

- If [error condition]: [what to do]
- If [error condition]: [what to do]

## Example

INPUT:
[A realistic example input]

OUTPUT:
[The complete expected output, matching the output format exactly]

The frontmatter (name, description, version) is used by skill registries and linters. The trigger section is the most important for reliability. The output format is the spec the model follows. The iron laws prevent the most common failure modes. The error handling prevents confusing refusals. The example is the regression test — without it, "matching the output format exactly" means something different every session.

A Complete Example

Here's a complete SKILL.md for a code review skill:

---
name: Code Review
description: Reviews code for bugs, security issues, and style problems
version: 1.2
---

## Trigger Conditions

ACTIVATE WHEN:
- User asks to review, check, or audit a code file or snippet
- User asks what's wrong with a piece of code
- User pastes code and asks for feedback

DO NOT ACTIVATE WHEN:
- User asks to explain how code works (use explanation, not review format)
- User asks to write new code from scratch
- User asks a single question about one line (answer directly, skip review format)

TRIGGER EXAMPLES:
- "Review this Python function for any issues"
- "What's wrong with this code?"
- "Can you audit this authentication handler?"

NOT TRIGGER EXAMPLES:
- "How does this sorting algorithm work?"
- "Write me a function that parses CSV"
- "What does line 12 do?"

## Output Format

### Summary
One sentence: what the code does and overall quality assessment.

### Issues Found
List each issue with:
- Severity: CRITICAL / HIGH / MEDIUM / LOW
- Location: file:line or function name
- Issue: what's wrong
- Fix: concrete suggestion

### Positive Notes
1-3 things the code does well (skip if nothing notable).

### Verdict
One of: APPROVE / APPROVE WITH CHANGES / REQUEST CHANGES / REJECT
Brief justification (1 sentence).

## Iron Laws

1. NEVER fabricate issues that don't exist to appear thorough —
   if the code is clean, say so in Positive Notes and APPROVE
2. NEVER mark a security vulnerability below HIGH severity
3. NEVER suggest rewrites for stylistic preference — only flag issues
   that affect correctness, security, or maintainability

## Error Handling

- If no code is provided: ask "Please paste the code you'd like me to review"
- If code is incomplete/truncated: note "Review based on partial code — full
  file review may reveal additional issues"
- If language is unclear: state the assumed language before beginning

## Example

INPUT:
def get_user(user_id):
    query = "SELECT * FROM users WHERE id = " + user_id
    return db.execute(query)

OUTPUT:

### Summary
A database lookup function with a critical SQL injection vulnerability.

### Issues Found
- Severity: CRITICAL
- Location: get_user(), line 2
- Issue: String concatenation in SQL query allows SQL injection. Any
  unsanitized user_id is injected directly into the query.
- Fix: Use parameterized queries:
  db.execute("SELECT * FROM users WHERE id = ?", (user_id,))

### Positive Notes
(None notable for a 3-line function.)

### Verdict
REQUEST CHANGES — fix the SQL injection before any deployment.

Best Practices

Write trigger examples before trigger conditions

Start by writing 10 example prompts the skill should handle and 5 it shouldn't. Derive your trigger conditions from those examples. The negative examples are usually more clarifying than the positive ones — they force you to articulate the exact boundary between this skill and adjacent behaviors.

Vague trigger (will misfire)

ACTIVATE WHEN: user asks about code — This fires on "explain this code", "write me code", "what's the history of Python", and dozens of other patterns that aren't code review.

Specific trigger (fires correctly)

ACTIVATE WHEN: user pastes code and asks for review, feedback, issues, or audit. DO NOT ACTIVATE WHEN: user asks for explanation, asks you to write new code, or asks a question about a single line without requesting review.

Make the output format a template, not a description

The output format section should show the structure, not describe it. "A section for issues found, each with severity and description" is a description. Showing the actual headers and fields — "### Issues Found" with "Severity: / Location: / Issue: / Fix:" — is a template. The model uses templates as direct patterns. Descriptions require interpretation.

Iron laws come from actual failures

Run the skill on the hardest version of each type of input it handles. The iron laws that prevent real failures are more valuable than generic goodness constraints. A code review skill that never fabricated issues didn't need "NEVER make up issues" as an iron law until it did it once on a clean file and eroded user trust.

Keep token budget in mind

Skills have a natural saturation point where additional instructions stop improving output quality. For focused, bounded tasks (code review, SQL generation, data extraction), saturation typically occurs around 400–600 tokens of skill definition. Beyond that, adding instructions dilutes the most important ones. Measure, don't guess — run the skill at different definition lengths and compare consistency.

When to Use Skills vs System Prompt

The decision rule is straightforward: if a behavior applies to everything the agent does, it belongs in the system prompt or CLAUDE.md. If it applies to a specific class of task and needs detailed instructions, output templates, or examples, it belongs in a SKILL.md file.

System prompt / CLAUDE.md is the right place for:

Agent identity and persona
General permissions and restrictions
Communication style and tone
Memory and persistence patterns
Cross-cutting iron laws that apply to all tasks

SKILL.md is the right place for:

Specialized output formats (reports, structured data, documents)
Domain-specific workflows (code review, research synthesis, data analysis)
Tasks with non-obvious edge cases that need explicit examples
Capabilities that might evolve independently of the base agent config
Behaviors shared across multiple agents or projects

A practical signal: if you find yourself writing a section in your CLAUDE.md that needs its own examples and iron laws, it's probably a skill. Extract it to a SKILL.md file and reference it from the main config.

Skills as a library

The real power of the SKILL.md format is that skills accumulate. Each well-written skill is reusable across projects and agents. A research skill, a code review skill, a data extraction skill — write them once, write them well, and they're assets you can deploy anywhere. The SKILL.md Generator on helloandy.net builds a high-scoring skill from a plain-text description of what you want the skill to do.

Generate a new skill or validate an existing one.

SKILL.md Generator → SKILL.md Linter