28 - Skills
Skills are instruction files that teach agents how to perform specific tasks. They don't call APIs, don't maintain state, don't run as processes. They're folders of markdown, scripts, and reference material that the agent loads on-demand when relevant to the current task.
If MCP is a USB port (connecting to external tools), skills are the employee handbook (teaching the agent your specific procedures).
This is the open specification: https://agentskills.io/specification
As of early 2026, the Agent Skills format is adopted by Claude Code, OpenAI Codex, GitHub Copilot, VS Code, Cursor, Gemini CLI, Amp, Goose, and 20+ other platforms. Write once, use everywhere.
Why Skills Exist
Without skills, you repeat yourself. Every conversation, you re-explain:
- "Use pdfplumber, not PyPDF2"
- "Our API uses snake_case, not camelCase"
- "Always run the linter after editing"
- "The test database is on port 5433, not 5432"
Skills package these instructions into a reusable, version-controlled file that the agent loads automatically when the task matches. You write the instruction once. Every conversation benefits.
Skills vs Prompts vs MCP
| System Prompt | Skill | MCP | |
|---|---|---|---|
| Scope | This conversation | Any matching conversation | Any matching conversation |
| What it provides | Context, personality, constraints | Procedures, workflows, domain knowledge | Tool execution, data access |
| Token cost | Loaded every turn | ~100 tokens discovery, ~5K on activation | 550-1,400 tokens per tool, every turn |
| Persistence | Per-conversation | Permanent, version-controlled | Permanent (server process) |
| Statefulness | No | No | Yes (can maintain connections, sessions) |
Use system prompts for conversation-level instructions. Use skills for reusable procedures. Use MCP for external capabilities the agent can't achieve with instructions alone.
Skills vs Project Instructions (CLAUDE.md, .cursorrules, etc.)
Most coding agents support project-level instruction files: CLAUDE.md for Claude Code, .cursorrules for Cursor, .github/copilot-instructions.md for Copilot. These are loaded every conversation in that project.
Skills are different: they load on-demand when the task matches. The distinction:
- Project instructions → "always true" facts about this project (coding conventions, test commands, architecture decisions)
- Skills → reusable procedures for task types that work across projects (PDF processing, data pipeline, thesis grading)
If you find yourself putting step-by-step workflows into CLAUDE.md, extract them into skills instead. Keep CLAUDE.md for project context; use skills for procedures.
Level 1: The Simplest Possible Skill
A skill is a directory containing one file: SKILL.md.
my-first-skill/
└── SKILL.md
SKILL.md has two parts: YAML frontmatter (metadata) and markdown body (instructions).
---
name: my-first-skill
description: Formats code output with line numbers and file paths. Use when generating code files or showing code to the user.
---
When showing code to the user, always include:
1. The full file path as a comment at the top
2. Line numbers for any code block longer than 10 lines
Example:
```python
# src/utils/helpers.py
def calculate_total(items):
return sum(item.price for item in items)
That's it. The agent reads the frontmatter at startup (~100 tokens), decides whether this skill is relevant to the current task, and if so, loads the full body.
Frontmatter Fields
| Field | Required | What it does |
|---|---|---|
name | Yes | Identifier. Max 64 chars. Lowercase, hyphens only. Must match directory name. |
description | Yes | Tells the agent when to use this skill. Max 1024 chars. This is the trigger. |
license | No | License name or file reference. |
compatibility | No | Environment requirements. Max 500 chars. |
metadata | No | Arbitrary key-value pairs (author, version, tags). |
allowed-tools | No | Pre-approved tools the skill may use. Experimental. |
The name Field Rules
name: pdf-processing # Valid
name: code-review # Valid
name: data-analysis # Valid
name: PDF-Processing # INVALID: uppercase
name: -pdf # INVALID: starts with hyphen
name: pdf--processing # INVALID: consecutive hyphens
name: my cool skill # INVALID: spaces
Must match the parent directory name exactly.
The description Field Is Everything
The description determines whether the skill activates. The agent scans all available skill descriptions at startup (~100 tokens each) and loads the full instructions only for relevant ones.
Bad — will never trigger:
description: Helps with PDFs.
Good — clear trigger conditions:
description: Extract text and tables from PDF files, fill PDF forms, and merge multiple PDFs. Use when working with PDF documents or when the user mentions PDFs, forms, or document extraction.
Aggressive — recommended by Anthropic because agents undertrigger:
description: >-
Apply TalTech thesis grading criteria to academic documents.
Use this skill whenever the user mentions thesis, grading,
academic evaluation, defense, rubric, or assessment criteria,
even if they don't explicitly ask for grading.
Include:
- What the skill does
- When to use it (positive triggers)
- Keywords the user might say
- Optionally: when NOT to use it (if similar skills exist)
Level 2: Adding Reference Material
When SKILL.md needs to stay under 500 lines but the domain knowledge is larger, split into referenced files.
thesis-grader/
├── SKILL.md
└── references/
├── grading-matrix.md
└── bloom-taxonomy.md
SKILL.md:
---
name: thesis-grader
description: >-
Evaluate master's thesis documents against TalTech grading criteria.
Use when reviewing thesis structure, methodology, or academic quality.
---
## Workflow
1. Read the thesis document
2. Load the grading matrix: see [grading-matrix.md](references/grading-matrix.md)
3. Evaluate each criterion on a 0-5 scale
4. If assessing learning outcomes, consult [bloom-taxonomy.md](references/bloom-taxonomy.md)
5. Produce a structured evaluation report
## Output Format
# Thesis Evaluation: [Title]
| Criterion | Score (0-5) | Justification |
|---|---|---|
| Research question clarity | X | ... |
| Methodology | X | ... |
| Literature review | X | ... |
| Results and analysis | X | ... |
| Writing quality | X | ... |
**Overall recommendation:** [Pass / Revise / Fail]
## Gotchas
- Level 0 (fail) means the criterion was not addressed at all
- Level 5+ (publishable quality) is a separate category, not just "really good level 5"
- Check citation format consistency — mixed APA/IEEE is an automatic deduction
The key: tell the agent when to load each reference file. "Load grading-matrix.md" is better than "see references/ for details." The agent may not recognize when it needs a file if you don't specify the trigger.
Progressive Disclosure in Action
This is why skills are cheap: a workspace with 20 installed skills costs only ~2,000 tokens at startup. Only the relevant skill loads its full instructions. Only specific references load when needed. Compare to MCP where 20 tools cost 11,000-28,000 tokens every single turn.
Level 3: Adding Scripts
Scripts make skills deterministic where it matters. Instead of hoping the agent writes correct parsing code, you provide a tested script.
pdf-processor/
├── SKILL.md
├── scripts/
│ ├── analyze_form.py
│ ├── validate_fields.py
│ └── fill_form.py
└── references/
└── REFERENCE.md
SKILL.md:
---
name: pdf-processor
description: >-
Extract text and tables from PDF files, fill PDF forms, merge documents.
Use when working with PDF files or when the user mentions PDFs, forms,
or document extraction.
---
## Text Extraction
Use pdfplumber for text extraction. For scanned documents, fall back to
pdf2image with pytesseract.
## Form Filling Workflow
1. Analyze the form:
bash
uv run scripts/analyze_form.py input.pdf
This produces `form_fields.json` listing every field name, type, and whether it's required.
2. Create `field_values.json` mapping each field name to its intended value.
3. Validate the mapping:
bash
uv run scripts/validate_fields.py form_fields.json field_values.json
Fix any errors before proceeding.
4. Fill the form:
bash
uv run scripts/fill_form.py input.pdf field_values.json output.pdf
5. Verify the output visually or with:
bash
uv run scripts/analyze_form.py output.pdf
Script Design Principles
Scripts run in a non-interactive shell. The agent reads stdout/stderr to decide what to do next. Design accordingly:
1. No interactive prompts. Ever.
# BAD: hangs forever
target = input("Target environment: ")
# GOOD: use flags
parser.add_argument("--env", required=True, choices=["dev", "staging", "prod"])
2. Implement --help
This is how the agent discovers the interface. Keep it concise — the output enters the context window.
Usage: scripts/process.py [OPTIONS] INPUT_FILE
Options:
--format FORMAT Output format: json, csv, table (default: json)
--output FILE Write to FILE instead of stdout
--verbose Print progress to stderr
Examples:
scripts/process.py data.csv
scripts/process.py --format csv --output report.csv data.csv
3. Informative error messages
The error message shapes the agent's next attempt. Vague errors waste a tool-use turn.
# BAD
print("Error: invalid input")
sys.exit(1)
# GOOD
print(f"Error: --format must be one of: json, csv, table. Received: '{args.format}'", file=sys.stderr)
sys.exit(1)
4. Structured output (JSON to stdout, diagnostics to stderr)
import json, sys
# Data goes to stdout (agent parses it)
json.dump({"status": "ok", "fields": field_list}, sys.stdout)
# Progress/warnings go to stderr (agent reads but doesn't parse)
print("Processing page 3/10...", file=sys.stderr)
5. Idempotent by default
Agents retry. "Create if not exists" is safer than "create and fail on duplicate."
6. Predictable output size
Many agent harnesses truncate tool output at 10-30K characters. If your script might produce large output, default to a summary or support --limit / --offset pagination.
Self-Contained Scripts (Inline Dependencies)
The agent shouldn't need to run pip install first. Use inline dependency declarations:
Python (PEP 723 + uv):
# /// script
# dependencies = [
# "pdfplumber>=0.10",
# "beautifulsoup4>=4.12,<5",
# ]
# requires-python = ">=3.10"
# ///
import pdfplumber
# ... rest of script
Run with: uv run scripts/extract.py
uv creates an isolated environment, installs dependencies, runs the script. First run downloads; subsequent runs use cache.
Node.js (npx):
npx -y eslint@9 --fix .
Deno (npm: imports):
import * as cheerio from "npm:cheerio@1.0.0";
Go:
go run golang.org/x/tools/cmd/goimports@v0.28.0 .
Pin versions always. Unpinned dependencies = non-reproducible skill.
Level 4: The Skill as a CLI Toolkit
Complex skills can bundle a full CLI tool that the agent calls for various sub-commands. This is the pattern used by Anthropic's own production skills (docx, xlsx, pptx, pdf).
data-pipeline/
├── SKILL.md
├── scripts/
│ ├── pipeline.py # Main CLI entry point
│ ├── validators/
│ │ ├── schema.py
│ │ └── quality.py
│ └── transforms/
│ ├── normalize.py
│ └── aggregate.py
├── references/
│ ├── schema-format.md
│ └── error-codes.md
└── assets/
└── default-config.yaml
SKILL.md:
---
name: data-pipeline
description: >-
Build, validate, and run ETL data pipelines with quality checks.
Use when the user wants to process, transform, validate, or load data,
or mentions ETL, data quality, schema validation, or data ingestion.
compatibility: Requires Python 3.10+ and uv
allowed-tools: Bash(uv:*) Read Write
---
## Quick Reference
All operations go through the pipeline CLI:
bash
# Validate a schema
uv run scripts/pipeline.py validate-schema data/input.csv
# Run quality checks
uv run scripts/pipeline.py check-quality data/input.csv --rules references/quality-rules.yaml
# Transform data
uv run scripts/pipeline.py transform data/input.csv --config assets/default-config.yaml --output data/output.parquet
# Full pipeline (validate → check → transform → load)
uv run scripts/pipeline.py run --config pipeline.yaml
Run `uv run scripts/pipeline.py --help` for full documentation.
Run `uv run scripts/pipeline.py <command> --help` for command-specific help.
## Workflow
1. **Start with schema validation.** Always validate input schema before processing.
2. **Run quality checks.** Review the quality report before transforming.
3. **Transform with the default config** unless the user specifies otherwise.
4. If quality checks fail, consult [error-codes.md](references/error-codes.md) for resolution steps.
5. If the schema format is unfamiliar, consult [schema-format.md](references/schema-format.md).
## Gotchas
- The `transform` command writes Parquet by default. Use `--format csv` for CSV output.
- Column names are normalized to snake_case automatically. To preserve original names: `--preserve-names`
- The `--dry-run` flag on any command shows what would happen without executing.
- Large files (>100MB): use `--streaming` mode to avoid memory issues.
The Pattern: Thin SKILL.md, Fat CLI
The SKILL.md is a routing document: it tells the agent which sub-command to use for each task type. The real logic lives in the Python scripts, where you get:
- Proper argument parsing with
argparseorclick - Unit tests for the pipeline logic
- Type safety, error handling, retry logic
- Dependency management via PEP 723
The agent calls the CLI. The CLI handles execution deterministically. The agent interprets the output and decides what to do next. This is the sweet spot: LLM reasoning for high-level decisions, deterministic code for execution.
Level 5: Skills with Evaluation and Iteration
Anthropic's skill-creator skill is a meta-skill that builds other skills. It includes a structured evaluation loop:
- Write the skill (SKILL.md + scripts + references)
- Define test cases (input prompts + expected outputs)
- Run the skill against test cases
- Grade results (automated assertions + human review)
- Iterate (fix instructions, re-test)
The evaluation framework uses:
evals.json— test case definitionsgrading.json— assertion criteria- Subagents for grading (comparator, analyzer)
- An eval viewer HTML report for human review
A minimal evals.json entry looks like this:
[
{
"name": "basic_pdf_extraction",
"prompt": "Extract all text from tests/fixtures/sample.pdf and return it as markdown",
"expected": {
"contains": ["Chapter 1", "Introduction"],
"format": "markdown",
"script_exits_zero": "scripts/validate_output.py"
}
},
{
"name": "handles_scanned_pdf",
"prompt": "Extract text from tests/fixtures/scanned.pdf",
"expected": {
"contains": ["OCR"],
"uses_tool": "scripts/analyze_form.py"
}
}
]
The pattern: define input prompts, specify what the output must contain or which scripts must succeed, then run the skill against each case and compare. This is test-driven development applied to agent instructions.
You don't need this complexity for every skill. But for production skills deployed across an organization, structured evaluation is the difference between "usually works" and "reliably works."
Source: https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md
Best Practices
Write What the Agent Doesn't Know
Don't explain what a PDF is. Don't explain how HTTP works. Focus on what's specific to your project, team, or domain that the agent would get wrong without instructions.
<!-- Wasted tokens — the agent knows this -->
PDF (Portable Document Format) is a common file format that contains
text, images, and other content.
<!-- High value — the agent doesn't know this -->
Use pdfplumber for text extraction. For scanned documents, fall back to
pdf2image with pytesseract.
Test: "Would the agent get this wrong without this instruction?" If no, cut it.
Gotchas Are the Highest-Value Content
Every time an agent makes a mistake you have to correct, add that correction to the gotchas section. This is the fastest path to improving a skill.
## Gotchas
- The `users` table uses soft deletes. Always include `WHERE deleted_at IS NULL`.
- User ID is `user_id` in the DB, `uid` in auth, `accountId` in billing.
All three are the same value.
- The `/health` endpoint returns 200 even if the database is down. Use `/ready`.
- When using Estonian locale, date format is DD.MM.YYYY, not MM/DD/YYYY.
Provide Defaults, Not Menus
<!-- BAD: forces the agent to choose -->
You can use pypdf, pdfplumber, PyMuPDF, or pdf2image for extraction...
<!-- GOOD: clear default, escape hatch for edge case -->
Use pdfplumber for text extraction.
For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
Procedures Over Declarations
<!-- BAD: specific answer, only works for this exact task -->
Join `orders` to `customers` on `customer_id`, filter `region = 'EMEA'`.
<!-- GOOD: reusable method -->
1. Read schema from `references/schema.yaml` to find relevant tables
2. Join using the `_id` foreign key convention
3. Apply user's filters as WHERE clauses
4. Aggregate numeric columns as needed
Validate Before Proceeding
The plan-validate-execute pattern prevents cascading errors:
1. Generate the migration script → save to `migration.sql`
2. Run `scripts/validate_migration.py migration.sql` to check for:
- Missing rollback statements
- References to non-existent tables
- Data loss risks
3. If validation fails, fix and re-validate
4. Only after validation passes: execute the migration
Installing Skills
Claude Code
# From a local directory
# Skills in .claude/skills/ are auto-discovered
# Upload as zip in claude.ai
# Settings > Capabilities > Skills > Upload
# From the skill directory (used as slash command)
# Place SKILL.md in .claude/skills/my-skill/SKILL.md
VS Code / GitHub Copilot
.github/skills/my-skill/SKILL.md
Or configure via chat.skillsLocations setting.
Cursor
Place in .cursor/skills/ or configure in settings.
OpenAI Codex
codex skills add ./my-skill
Or place in .codex/skills/.
Cross-Platform
The Agent Skills format is the same everywhere. The same SKILL.md file works across all platforms. Distribution mechanisms differ (zip upload, directory placement, marketplace), but the file format is identical.
Validate your skill against the spec:
# Using the reference validator
npx skills-ref validate ./my-skill
# Checks: frontmatter validity, name conventions, line count, token budget
Skills vs MCP: Decision Framework
Do I need the agent to CALL something external?
→ MCP (API, database, browser, file system)
Do I need the agent to KNOW something specific?
→ Skill (procedures, conventions, domain knowledge)
Do I need the agent to RUN deterministic code?
→ Skill with scripts (validation, formatting, analysis)
Do I need both knowledge AND external access?
→ Skill (instructions) + MCP (tools), used together
Am I repeating the same prompt instructions across conversations?
→ Extract into a skill
Is this a one-off task with unique context?
→ System prompt or just tell the agent directly
Skills and MCP compose naturally. A skill can instruct the agent to use MCP tools in a specific sequence:
## Deployment Workflow
1. Run tests: `scripts/run_tests.sh`
2. Check GitHub CI status using the GitHub MCP server
3. If CI passes, deploy using: `scripts/deploy.sh --env staging`
4. Verify deployment using the Playwright MCP to check the health endpoint
5. If health check fails, rollback: `scripts/deploy.sh --rollback`
The skill provides the procedure. MCP provides the capabilities. The agent orchestrates.
Example: A Course-Relevant Skill
Here's a skill you might build for the homework:
api-provider-adapter/
├── SKILL.md
├── scripts/
│ ├── test_provider.py
│ └── generate_adapter.py
├── references/
│ ├── openai-format.md
│ └── anthropic-format.md
└── assets/
└── adapter-template.ts
SKILL.md:
---
name: api-provider-adapter
description: >-
Generate dual-provider API adapters for OpenAI and Anthropic.
Use when implementing tool calling, message formatting, or
streaming across multiple LLM providers. Use when the user
mentions dual provider, multi-provider, OpenAI/Anthropic
compatibility, or adapter pattern.
compatibility: Requires Node.js 18+ or Python 3.10+
---
## Workflow
1. Identify which API features need adaptation (messages, tools, streaming)
2. For tool calling differences, consult [openai-format.md](references/openai-format.md)
and [anthropic-format.md](references/anthropic-format.md)
3. Generate adapter code using the template:
bash
uv run scripts/generate_adapter.py --features tools,streaming --lang typescript
4. Test against both providers:
bash
uv run scripts/test_provider.py --provider openai --adapter ./adapter.ts
uv run scripts/test_provider.py --provider anthropic --adapter ./adapter.ts
## Key Differences to Handle
- OpenAI: arguments as JSON string (must parse), role "tool" for results
- Anthropic: arguments as parsed object, tool_result inside "user" message
- OpenAI: `finish_reason: "tool_calls"`, Anthropic: `stop_reason: "tool_use"`
- See lecture 26 for the complete comparison table
## Gotchas
- OpenAI's Responses API uses different item types than Chat Completions.
The adapter must handle both if supporting legacy code.
- Anthropic returns `input` (parsed JSON), OpenAI returns `arguments` (JSON string).
Forgetting to JSON.parse() the OpenAI side is the #1 student bug.
- Streaming format differs significantly. Don't try to unify streaming
into a common format on the first pass — get non-streaming working first.
References
Specification
- Agent Skills Specification: https://agentskills.io/specification
- Best Practices: https://agentskills.io/skill-creation/best-practices
- Optimizing Descriptions: https://agentskills.io/skill-creation/optimizing-descriptions
- Evaluating Skills: https://agentskills.io/skill-creation/evaluating-skills
- Using Scripts: https://agentskills.io/skill-creation/using-scripts
- GitHub spec repo: https://github.com/agentskills/agentskills
Examples and Registries
- Anthropic's official skills repo: https://github.com/anthropics/skills
- Skill-creator skill (meta): https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md
- Awesome Claude Skills: https://github.com/travisvn/awesome-claude-skills
- Awesome Claude Code (skills + hooks + subagents): https://github.com/hesreallyhim/awesome-claude-code
Platform-Specific Documentation
- Claude Code skills: https://code.claude.com/docs/en/skills
- VS Code / Copilot skills: https://code.visualstudio.com/docs/copilot/customization/agent-skills
- OpenAI Codex skills: https://developers.openai.com/codex/skills
- Claude.ai custom skills: https://support.claude.com/en/articles/12512198-how-to-create-custom-skills
Deep Dives
- Architecture deep dive: https://leehanchung.github.io/blogs/2025/10/26/claude-skills-deep-dive/
- Anthropic's production skills (docx/pdf/pptx/xlsx): see
skills/in https://github.com/anthropics/skills - Codecademy tutorial: https://www.codecademy.com/article/how-to-build-claude-skills