SKILL.md AI Agents Claude Code March 2026 · Andy

What Is a SKILL.md File? (And How to Write One That Actually Works)

A SKILL.md file is a plain-text instruction document that gives an AI agent a specific, reusable capability. It defines what the skill does, when it activates, what output it produces, and what constraints it follows. This guide covers the full anatomy, trigger types, common mistakes, and best practices — with a complete annotated example you can adapt.

Contents
  1. What is a SKILL.md file?
  2. How skills are triggered
  3. The anatomy of a SKILL.md file
  4. Writing effective trigger conditions
  5. Best practices: examples before conditions
  6. Skill vs. system prompt vs. CLAUDE.md
  7. Generate your SKILL.md free

What Is a SKILL.md File?

A SKILL.md file is a structured Markdown document that encodes a specific capability for an AI agent. Where a system prompt defines who the agent is, a SKILL.md defines what the agent can do in a particular context. Skills are modular: you write one, test it, and reuse it across projects without touching the base agent configuration.

The format originates from Claude Code's skill system, but the pattern is useful anywhere you run an LLM agent with file-based configuration. A skill file typically contains four to six sections: a name and description, trigger conditions that tell the model when to activate the skill, an output template that defines the structure of the response, iron laws that enumerate what the skill must never do, and optionally a few canonical examples.

The key difference between SKILL.md and the other configuration files:

File Scope Purpose Loaded when
CLAUDE.md Global / project Agent identity, permissions, memory rules, cross-cutting behavior Always — every session
System prompt Per-deployment Runtime persona, tool grants, session constraints Injected by the host application
SKILL.md Per-capability One specific task: its triggers, output format, and constraints On demand, when the skill is invoked

A useful analogy: CLAUDE.md is the employee handbook. The system prompt is the shift briefing. A SKILL.md is the procedure manual for a specific job — code review, report generation, data extraction — that an employee picks up when that job needs doing.

How Skills Are Triggered

Trigger conditions are the most consequential part of a skill file. They determine when the model applies the skill's output format and constraints instead of defaulting to its general behavior. There are three trigger patterns in practice.

Explicit triggers

The user or system directly invokes the skill by name or by a slash command. This is the most reliable pattern because there is no ambiguity. A user types /code-review or the host application prepends SKILL: code-review to the prompt. The skill activates unconditionally. Explicit triggers are appropriate for long-form, structured outputs — audit reports, generated documents, data extractions — where the user clearly knows they want a specific format.

Intent-based triggers

The model infers from the request that the skill applies. The trigger condition in the file describes the intent pattern: "user pastes code and asks for feedback, review, issues, or audit." This requires precise language in the trigger section. Vague intent conditions — "user seems to want a review" — produce inconsistent activation. Precise ones — "user pastes a code block AND asks for issues, feedback, or review" — are stable. The connector word AND matters: it forces the model to check both parts.

Context-based triggers

The skill activates because of the surrounding context rather than the explicit request. A research synthesis skill might trigger when the conversation contains more than three web search results and the user asks for a summary. A data extraction skill might trigger when the user pastes a table or CSV and asks any follow-up question. Context-based triggers are the most powerful and the most error-prone. They work best when the context signal is unambiguous — structured data, a specific file format, a named artifact — rather than inferred from tone or topic.

Key principle
Every trigger condition needs a NOT counterpart. A code review skill that fires on "user pastes code" will also fire when the user pastes code to ask a question about it. Add: DO NOT ACTIVATE WHEN: user asks a question about code without requesting feedback or review. The NOT condition is where most skills fail.

The Anatomy of a SKILL.md File

Below is a complete, annotated SKILL.md for a SQL query review skill. Each section is labeled with its purpose. This is a realistic example — 400 tokens, tight scope, explicit iron laws.

# SKILL: SQL Query Reviewer
# ── What this skill does ──────────────────────────────────────────
DESCRIPTION: Review a SQL query for correctness, performance risk,
and style issues. Produce a structured report, not inline edits.

# ── When to activate ─────────────────────────────────────────────
ACTIVATE WHEN:
  - User pastes a SQL query (SELECT, INSERT, UPDATE, DELETE, or DDL)
    AND asks for review, feedback, issues, audit, or "what's wrong"
  - User says "check this SQL" or "review my query"

DO NOT ACTIVATE WHEN:
  - User asks what a SQL statement does (explanation, not review)
  - User asks you to write or fix SQL (generation, not review)
  - User asks about a single clause without requesting a full review

# ── Output format ─────────────────────────────────────────────────
OUTPUT FORMAT:
## SQL Review

**Query summary:** [one sentence — what the query does]

### Correctness
[List issues that would cause errors or wrong results.
  If none: "No correctness issues found."]
- Issue: [description]
  Severity: CRITICAL | HIGH | MEDIUM
  Fix: [specific fix]

### Performance
[List potential performance risks.]
- Risk: [description]
  Severity: HIGH | MEDIUM | LOW
  Note: [context or suggested index/rewrite]

### Style
[Naming, formatting, or convention issues. Skip if clean.]
- [issue]: [suggestion]

**Verdict:** PASS | NEEDS CHANGES | CRITICAL ISSUES

# ── Iron laws ─────────────────────────────────────────────────────
IRON LAWS:
- NEVER fabricate issues. If the query is clean, say so.
- NEVER rewrite the query unless the user asks after seeing the review.
- NEVER omit Severity labels — every issue must be classified.
- NEVER give a PASS verdict if any CRITICAL or HIGH issue exists.

# ── Example ───────────────────────────────────────────────────────
EXAMPLE INPUT:
  "Review this: SELECT * FROM orders WHERE customer_id = 123"

EXAMPLE OUTPUT:
  ## SQL Review
  **Query summary:** Retrieves all columns for a single customer's orders.

  ### Correctness
  No correctness issues found.

  ### Performance
  - Risk: SELECT * fetches all columns including large/unused fields
    Severity: MEDIUM
    Note: Specify needed columns to reduce row size and improve index coverage
  - Risk: No index on customer_id confirmed
    Severity: LOW
    Note: Verify index exists; this query will full-scan without one

  ### Style
  - Use explicit column list instead of SELECT *

  **Verdict:** NEEDS CHANGES

Notice what each section does:

Writing Effective Trigger Conditions — The Most Common Mistakes

Trigger conditions fail in predictable ways. These are the four patterns that account for most broken skills:

Missing the NOT condition
ACTIVATE WHEN: "user asks about code" — this fires on every code question, not just review requests. Every trigger needs a DO NOT ACTIVATE counterpart that names the adjacent cases the skill should not handle.
Using OR where AND is required
"user pastes code OR asks for review" — the OR makes this fire on any code paste, even without a review request. Use AND when the trigger requires multiple simultaneous signals.
Synonym gaps
"asks for review" misses: feedback, audit, issues, what's wrong, check this, look at this. Trigger conditions need the full synonym set for the user intent they're targeting. When in doubt, list five ways a user might phrase the request.
Over-broad scope in the description
"help with code" as the DESCRIPTION leaks scope. The model uses the description to decide what counts as "within the skill." A description that matches too broadly causes the skill to activate in unintended situations.
The pattern that works
State the required artifact (code paste, SQL query, CSV data) AND the required intent (review, audit, issues) as two separate conditions connected by AND. Then write the NOT condition to name the three nearest cases that look similar but shouldn't trigger the skill.

Best Practices: Write Examples Before Conditions

The most effective technique for writing trigger conditions is to draft the examples section first. Write two or three concrete input/output pairs for the skill you want to build before you touch the ACTIVATE WHEN section. The act of writing examples forces you to make implicit decisions explicit:

Once you have three examples, the trigger conditions almost write themselves — you're describing the pattern you already demonstrated rather than speculating about it.

Calibrate iron laws from real failures

Don't write iron laws preemptively from general principles. Run the skill draft on ten varied inputs and note every output that's wrong. The iron laws that prevent actual failures are worth ten times the generic goodness constraints. "NEVER fabricate issues" comes from running the skill on a clean input and watching it invent problems. "NEVER give a PASS verdict if any HIGH issue exists" comes from the opposite failure — finding a critical issue and still outputting PASS because the summary section was positive.

Keep the output format as a template, not a description

The output format section should show the structure with real headers and placeholder syntax, not describe it in prose. "A section for issues, each with a severity label" requires the model to interpret what that means. Showing ### Correctness followed by - Issue: / Severity: / Fix: gives it a direct pattern to copy. The model fills templates more reliably than it follows descriptions.

Token budget

Skills have a saturation point. For focused, bounded tasks, adding instructions beyond 500–600 tokens typically stops improving output quality and can dilute the most important constraints by burying them. Measure: run the skill at 300, 500, and 700 tokens and compare consistency. Stop adding instructions when the consistency plateau is reached.

When to Use a Skill vs. a System Prompt vs. a CLAUDE.md

The decision is mechanical once you know the rule: if a behavior applies to everything the agent does, it belongs in CLAUDE.md or the system prompt. If it applies only to a specific class of task and needs its own output format, examples, or iron laws, it belongs in a SKILL.md file.

Belongs in CLAUDE.md / system prompt Belongs in SKILL.md
Agent identity and persona Specialized output formats
General communication style Domain-specific workflows
Memory and persistence rules Tasks with edge-case-heavy logic
Cross-cutting iron laws (all tasks) Capabilities shared across agents
Tool permissions Behaviors that may evolve independently

A practical signal: if you find yourself writing a section in CLAUDE.md that needs its own examples, iron laws, or output template, it's a skill. Extract it to a SKILL.md file and reference it from the main config. This keeps CLAUDE.md from growing into an unmaintainable monolith and lets you version each skill independently.

Reuse is the real payoff
Skills accumulate. A code review skill, a SQL review skill, a research synthesis skill — write them once to a high standard and they're reusable assets across every project that needs them. The format is portable: any agent that reads Markdown can use a well-written SKILL.md.

Generate Your SKILL.md Free

The SKILL.md Generator on helloandy.net builds a complete, structured skill file from a plain-text description of what you want the skill to do. Describe the task, the inputs it handles, and the output you want — the generator produces a full SKILL.md with trigger conditions, output template, and iron laws. No account required.

Once you have a draft, run it through the SKILL.md Linter to get a quality score on the eight-point rubric: trigger precision, output format completeness, iron law coverage, example quality, scope clarity, synonym coverage, NOT-condition presence, and token efficiency. The linter flags the specific sections that are pulling the score down.

Build and validate your SKILL.md — free, no account required.

SKILL.md Generator → SKILL.md Linter