A SKILL.md file is a plain-text instruction document that gives an AI agent a specific, reusable capability. It defines what the skill does, when it activates, what output it produces, and what constraints it follows. This guide covers the full anatomy, trigger types, common mistakes, and best practices — with a complete annotated example you can adapt.
A SKILL.md file is a structured Markdown document that encodes a specific capability for an AI agent. Where a system prompt defines who the agent is, a SKILL.md defines what the agent can do in a particular context. Skills are modular: you write one, test it, and reuse it across projects without touching the base agent configuration.
The format originates from Claude Code's skill system, but the pattern is useful anywhere you run an LLM agent with file-based configuration. A skill file typically contains four to six sections: a name and description, trigger conditions that tell the model when to activate the skill, an output template that defines the structure of the response, iron laws that enumerate what the skill must never do, and optionally a few canonical examples.
The key difference between SKILL.md and the other configuration files:
| File | Scope | Purpose | Loaded when |
|---|---|---|---|
CLAUDE.md |
Global / project | Agent identity, permissions, memory rules, cross-cutting behavior | Always — every session |
| System prompt | Per-deployment | Runtime persona, tool grants, session constraints | Injected by the host application |
SKILL.md |
Per-capability | One specific task: its triggers, output format, and constraints | On demand, when the skill is invoked |
A useful analogy: CLAUDE.md is the employee handbook. The system prompt is the shift briefing. A SKILL.md is the procedure manual for a specific job — code review, report generation, data extraction — that an employee picks up when that job needs doing.
Trigger conditions are the most consequential part of a skill file. They determine when the model applies the skill's output format and constraints instead of defaulting to its general behavior. There are three trigger patterns in practice.
The user or system directly invokes the skill by name or by a slash command. This is the most reliable pattern because there is no ambiguity. A user types /code-review or the host application prepends SKILL: code-review to the prompt. The skill activates unconditionally. Explicit triggers are appropriate for long-form, structured outputs — audit reports, generated documents, data extractions — where the user clearly knows they want a specific format.
The model infers from the request that the skill applies. The trigger condition in the file describes the intent pattern: "user pastes code and asks for feedback, review, issues, or audit." This requires precise language in the trigger section. Vague intent conditions — "user seems to want a review" — produce inconsistent activation. Precise ones — "user pastes a code block AND asks for issues, feedback, or review" — are stable. The connector word AND matters: it forces the model to check both parts.
The skill activates because of the surrounding context rather than the explicit request. A research synthesis skill might trigger when the conversation contains more than three web search results and the user asks for a summary. A data extraction skill might trigger when the user pastes a table or CSV and asks any follow-up question. Context-based triggers are the most powerful and the most error-prone. They work best when the context signal is unambiguous — structured data, a specific file format, a named artifact — rather than inferred from tone or topic.
Below is a complete, annotated SKILL.md for a SQL query review skill. Each section is labeled with its purpose. This is a realistic example — 400 tokens, tight scope, explicit iron laws.
# SKILL: SQL Query Reviewer
# ── What this skill does ──────────────────────────────────────────
DESCRIPTION: Review a SQL query for correctness, performance risk,
and style issues. Produce a structured report, not inline edits.
# ── When to activate ─────────────────────────────────────────────
ACTIVATE WHEN:
- User pastes a SQL query (SELECT, INSERT, UPDATE, DELETE, or DDL)
AND asks for review, feedback, issues, audit, or "what's wrong"
- User says "check this SQL" or "review my query"
DO NOT ACTIVATE WHEN:
- User asks what a SQL statement does (explanation, not review)
- User asks you to write or fix SQL (generation, not review)
- User asks about a single clause without requesting a full review
# ── Output format ─────────────────────────────────────────────────
OUTPUT FORMAT:
## SQL Review
**Query summary:** [one sentence — what the query does]
### Correctness
[List issues that would cause errors or wrong results.
If none: "No correctness issues found."]
- Issue: [description]
Severity: CRITICAL | HIGH | MEDIUM
Fix: [specific fix]
### Performance
[List potential performance risks.]
- Risk: [description]
Severity: HIGH | MEDIUM | LOW
Note: [context or suggested index/rewrite]
### Style
[Naming, formatting, or convention issues. Skip if clean.]
- [issue]: [suggestion]
**Verdict:** PASS | NEEDS CHANGES | CRITICAL ISSUES
# ── Iron laws ─────────────────────────────────────────────────────
IRON LAWS:
- NEVER fabricate issues. If the query is clean, say so.
- NEVER rewrite the query unless the user asks after seeing the review.
- NEVER omit Severity labels — every issue must be classified.
- NEVER give a PASS verdict if any CRITICAL or HIGH issue exists.
# ── Example ───────────────────────────────────────────────────────
EXAMPLE INPUT:
"Review this: SELECT * FROM orders WHERE customer_id = 123"
EXAMPLE OUTPUT:
## SQL Review
**Query summary:** Retrieves all columns for a single customer's orders.
### Correctness
No correctness issues found.
### Performance
- Risk: SELECT * fetches all columns including large/unused fields
Severity: MEDIUM
Note: Specify needed columns to reduce row size and improve index coverage
- Risk: No index on customer_id confirmed
Severity: LOW
Note: Verify index exists; this query will full-scan without one
### Style
- Use explicit column list instead of SELECT *
**Verdict:** NEEDS CHANGES
Notice what each section does:
Trigger conditions fail in predictable ways. These are the four patterns that account for most broken skills:
The most effective technique for writing trigger conditions is to draft the examples section first. Write two or three concrete input/output pairs for the skill you want to build before you touch the ACTIVATE WHEN section. The act of writing examples forces you to make implicit decisions explicit:
Once you have three examples, the trigger conditions almost write themselves — you're describing the pattern you already demonstrated rather than speculating about it.
Don't write iron laws preemptively from general principles. Run the skill draft on ten varied inputs and note every output that's wrong. The iron laws that prevent actual failures are worth ten times the generic goodness constraints. "NEVER fabricate issues" comes from running the skill on a clean input and watching it invent problems. "NEVER give a PASS verdict if any HIGH issue exists" comes from the opposite failure — finding a critical issue and still outputting PASS because the summary section was positive.
The output format section should show the structure with real headers and placeholder syntax, not describe it in prose. "A section for issues, each with a severity label" requires the model to interpret what that means. Showing ### Correctness followed by - Issue: / Severity: / Fix: gives it a direct pattern to copy. The model fills templates more reliably than it follows descriptions.
Skills have a saturation point. For focused, bounded tasks, adding instructions beyond 500–600 tokens typically stops improving output quality and can dilute the most important constraints by burying them. Measure: run the skill at 300, 500, and 700 tokens and compare consistency. Stop adding instructions when the consistency plateau is reached.
The decision is mechanical once you know the rule: if a behavior applies to everything the agent does, it belongs in CLAUDE.md or the system prompt. If it applies only to a specific class of task and needs its own output format, examples, or iron laws, it belongs in a SKILL.md file.
| Belongs in CLAUDE.md / system prompt | Belongs in SKILL.md |
|---|---|
| Agent identity and persona | Specialized output formats |
| General communication style | Domain-specific workflows |
| Memory and persistence rules | Tasks with edge-case-heavy logic |
| Cross-cutting iron laws (all tasks) | Capabilities shared across agents |
| Tool permissions | Behaviors that may evolve independently |
A practical signal: if you find yourself writing a section in CLAUDE.md that needs its own examples, iron laws, or output template, it's a skill. Extract it to a SKILL.md file and reference it from the main config. This keeps CLAUDE.md from growing into an unmaintainable monolith and lets you version each skill independently.
The SKILL.md Generator on helloandy.net builds a complete, structured skill file from a plain-text description of what you want the skill to do. Describe the task, the inputs it handles, and the output you want — the generator produces a full SKILL.md with trigger conditions, output template, and iron laws. No account required.
Once you have a draft, run it through the SKILL.md Linter to get a quality score on the eight-point rubric: trigger precision, output format completeness, iron law coverage, example quality, scope clarity, synonym coverage, NOT-condition presence, and token efficiency. The linter flags the specific sections that are pulling the score down.