29 - Misc
Lectures 26–28 gave you the building blocks: tool calling, MCP, skills. Lecture 30 covers the orchestration layer — subagents. This lecture covers everything in between: the configuration, automation, and intelligence layers that make the whole system work well in practice.
You will learn how to tell agents what to do before they start (project instructions), how to automate their behavior (hooks), how to give them structural code understanding (LSP, codebase indexing), how they remember across sessions (memory), and how to manage the finite context window that everything competes for (context management). These are the topics that separate a working agent from a productive one.
Project Instructions
Every coding agent supports a way to inject persistent instructions into every conversation. These are not prompts you type each time — they are files checked into your repo (or placed in your home directory) that the agent reads automatically at session start. They tell the agent things it cannot learn from the code alone: which test runner to use, which conventions to follow, which directories are off-limits, what the deployment process looks like.
The problem: every tool invented its own format. If your team uses multiple tools, you need to know which file each one reads.
The Instruction File Ecosystem
| File | Read by | Scope |
|---|---|---|
CLAUDE.md | Claude Code | Anthropic's agents |
AGENTS.md | OpenAI Codex, forgecode | OpenAI ecosystem + compatible tools |
AGENTS.override.md | OpenAI Codex | Per-folder overrides (replaces, not extends) |
.cursorrules | Cursor | Cursor IDE only |
.github/copilot-instructions.md | GitHub Copilot | Copilot in VS Code and GitHub |
.kilocode/rules/*.md | Kilo Code | Kilo Code modes (code, architect, etc.) |
opencode.json (instructions field) | opencode | opencode sessions |
Most projects that want broad tool support maintain both CLAUDE.md and AGENTS.md. The content is largely the same — project facts are tool-agnostic — but the files are read by different harnesses.
Layering and Precedence
Instructions cascade from general to specific. More specific files override or extend more general ones.
Claude Code reads three layers:
~/.claude/CLAUDE.md— user-level, applies to all projects (your personal preferences)CLAUDE.mdat the project root — project-level, shared with the teamCLAUDE.mdin any subdirectory — folder-level, scoped to that subtree
All three are concatenated. Folder-level instructions add to project-level, they do not replace. If there is a conflict, the most specific file wins in practice because the model sees it last.
# ~/.claude/CLAUDE.md (user-level)
- I prefer concise responses without trailing summaries.
- Default to TypeScript for new files unless the project uses another language.
- Run tests before committing.
# CLAUDE.md (project root)
- This is a monorepo: packages/api (Express + TypeScript), packages/web (Next.js), packages/shared (common types).
- Use pnpm, not npm. The lock file is pnpm-lock.yaml.
- Test command: pnpm test. Lint: pnpm lint.
- Database is PostgreSQL. Migrations are in packages/api/prisma/migrations.
- Never modify migration files after they have been applied.
# packages/api/CLAUDE.md (folder-level)
- Use snake_case for database column names, camelCase for TypeScript properties.
- All endpoints require authentication middleware except /health and /auth/*.
- Use Zod for request validation, not manual type guards.
OpenAI Codex reads a similar chain:
~/.codex/AGENTS.md— user-levelAGENTS.mdat project root — project-levelAGENTS.mdin subdirectories — folder-level (extends parent)AGENTS.override.mdin subdirectories — folder-level (replaces parent, not extends)
The AGENTS.override.md distinction matters. If a subfolder has fundamentally different conventions (e.g., a Python data pipeline inside a TypeScript monorepo), you want to replace the parent instructions, not stack on top of them.
opencode reads opencode.json at the project root. Instructions are set via the instructions field (a string or array of strings). User-level config lives in ~/.config/opencode/config.json. There is no folder-level override — it is flat.
forgecode reads AGENTS.md at the project root (following the Codex convention) plus .forge.toml for configuration. No folder-level cascading.
What to Put Where
Not everything belongs in the same file. The layering exists for a reason:
User-level (your personal ~/.claude/CLAUDE.md or ~/.codex/AGENTS.md):
- Response style preferences ("be concise", "no emojis", "no trailing summaries")
- Default language and tooling choices when the project doesn't specify
- Personal workflow habits ("always run tests before committing")
Project-level (root CLAUDE.md / AGENTS.md):
- Project architecture and directory layout
- Build, test, lint, and deploy commands
- Coding conventions (naming, patterns, libraries to use or avoid)
- Database and infrastructure facts
- Links to relevant documentation or dashboards
Folder-level (subdirectory CLAUDE.md / AGENTS.md):
- Conventions specific to that package or module
- Exceptions to project-wide rules
- Module-specific gotchas
Anti-pattern: putting workflows in project instructions. If you find yourself writing step-by-step procedures ("when deploying, first run X, then Y, then check Z"), extract those into skills instead. Project instructions are for facts about the project. Skills are for reusable procedures. Lecture 28 covers this distinction.
Writing Effective Instructions
The agent reads your instructions on every turn. Every unnecessary sentence costs tokens across the entire session. Be direct:
- Be imperative. "Use pnpm" not "We generally prefer to use pnpm when possible."
- Be specific. "Test command:
pnpm test" not "Make sure to run the appropriate tests." - Include gotchas. "The dev database runs on port 5433, not the default 5432" saves a debugging session.
- Skip what the model knows. Don't explain what TypeScript is. Don't describe how git works. Describe what is unique to this project.
- Reference lecture 28's description advice. The same principles apply: concrete triggers, explicit scope, no filler words.
What to Commit vs Gitignore
| File | Commit? | Why |
|---|---|---|
CLAUDE.md | Yes | Team-shared project facts |
AGENTS.md | Yes | Team-shared project facts |
.claude/settings.json | Yes | Team-shared tool config (MCP servers, allowed tools) |
.claude/settings.local.json | No | Personal overrides, may contain paths or API keys |
.claude/memory/ | No | Personal memory, session-specific |
.cursorrules | Yes | Team-shared Cursor conventions |
opencode.json | Yes | Team-shared opencode config |
.forge.toml | Yes | Team-shared forgecode config |
The rule: if it encodes team decisions, commit it. If it encodes personal preferences or ephemeral state, gitignore it.
Configuration and Settings
Beyond instruction files, each tool has a configuration system that controls permissions, MCP servers, model selection, and behavior. Understanding where configuration lives prevents the "works on my machine" problem.
Claude Code Settings
Claude Code reads settings from three locations, merged in order (later overrides earlier):
- User settings:
~/.claude/settings.json— personal defaults across all projects - Project settings:
.claude/settings.json— shared team configuration, committed to git - Local settings:
.claude/settings.local.json— personal overrides, gitignored
{
"permissions": {
"allow": [
"Read",
"Glob",
"Grep",
"Bash(npm test)",
"Bash(npm run lint)",
"Bash(npm run build)"
],
"deny": [
"Bash(rm -rf *)",
"Bash(git push --force)"
]
},
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"]
}
},
"hooks": {}
}
The permissions.allow array uses exact match or glob-style patterns. Bash(npm test) allows exactly that command. Bash(npm *) allows any npm command. The deny array takes precedence — a command matching both allow and deny is denied.
opencode Configuration
opencode uses opencode.json at the project root:
{
"instructions": [
"Use pnpm for package management.",
"Run pnpm test before committing."
],
"provider": {
"default": "anthropic"
},
"model": {
"default": "claude-sonnet-4-6"
},
"permissions": {
"Bash": "ask",
"Write": "allow",
"Read": "allow"
},
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
}
}
}
Permissions are per-tool with three modes: "allow" (always run), "ask" (prompt the user), "deny" (never run). You can also set per-command permissions for Bash using regex patterns.
User-level config lives in ~/.config/opencode/config.json and follows the same schema.
forgecode Configuration
forgecode uses TOML for its primary configuration:
# .forge.toml
[model]
provider = "anthropic"
name = "claude-sonnet-4-6"
[permissions]
allow_bash = true
auto_approve = ["Read", "Glob", "Grep"]
require_approval = ["Write", "Bash"]
[mcp]
config_path = ".mcp.json"
MCP server configuration is typically in a separate .mcp.json file (shared JSON format compatible with Claude Code and other tools).
Permission Models Compared
| Aspect | Claude Code | opencode | forgecode |
|---|---|---|---|
| Permission granularity | Per-tool + per-command regex | Per-tool, per-command regex | Per-tool categories |
| Default behavior | Ask for everything | Ask for Bash, allow reads | Configurable default |
| Deny list | Explicit deny array | Per-tool "deny" mode | deny list in TOML |
| Allow patterns | Glob: Bash(npm *) | Regex on command string | Category-based |
| MCP tool permissions | Via allowedTools in agent frontmatter | Per-server tool filtering | Per-server config |
| User vs project override | settings.local.json overrides | User config extends project | User config extends project |
The practical takeaway: all three let you pre-approve safe operations (read, search, test) and require confirmation for destructive ones (write, delete, bash). The syntax differs but the model is the same.
Hooks
Hooks are shell commands that the harness executes automatically when specific events occur — before a tool call, after a file edit, when the agent stops. They are not agent code. The model does not decide to run them. The harness fires them deterministically based on event matching rules.
Think of them as git hooks, but for agent actions instead of git operations.
What Hooks Are
A hook has three parts:
- Event — when it fires (before a tool call, after a tool call, on notification, on stop)
- Matcher — which tool calls it applies to (optional — omit to match all)
- Command — the shell command to execute
The harness runs the command, captures its output, and optionally blocks the agent action if the hook exits with a non-zero status code. The agent sees hook feedback as system messages — it can read "hook blocked this action because..." and adjust its behavior.
Claude Code Hooks in Detail
Hooks are configured in settings.json (any of the three layers: user, project, local):
{
"hooks": {
"PreToolCall": [
{
"matcher": {
"toolName": "Bash",
"toolInput": {
"command": "(rm -rf|DROP TABLE|TRUNCATE|git push --force)"
}
},
"hooks": [
{
"type": "command",
"command": "echo 'BLOCKED: Dangerous command detected' && exit 1"
}
]
}
],
"PostToolCall": [
{
"matcher": {
"toolName": "(Edit|Write)"
},
"hooks": [
{
"type": "command",
"command": "npx eslint --fix \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true"
}
]
}
],
"Notification": [
{
"hooks": [
{
"type": "command",
"command": "osascript -e 'display notification \"$CLAUDE_NOTIFICATION\" with title \"Claude Code\"'"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "say 'Task complete'"
}
]
}
]
}
}
Event types:
| Event | When it fires | Can block? |
|---|---|---|
PreToolCall | Before a tool call executes | Yes — exit 1 cancels the call |
PostToolCall | After a tool call completes | No — action already happened |
Notification | When the agent emits a notification | No |
Stop | When the agent finishes its turn | No |
SubagentStop | When a subagent completes | No |
Matcher fields:
toolName— regex matched against the tool name ("Bash","Edit","Write","(Edit|Write)")toolInput— object whose keys are tool input field names and values are regex patterns. For Bash, match againstcommand. For Edit/Write, match againstfile_path.
Environment variables available inside hook commands:
| Variable | Content |
|---|---|
CLAUDE_TOOL_NAME | Name of the tool being called |
CLAUDE_TOOL_INPUT_* | Tool input fields, uppercased (e.g., CLAUDE_TOOL_INPUT_COMMAND for Bash) |
CLAUDE_TOOL_RESULT | Tool output (PostToolCall only) |
CLAUDE_NOTIFICATION | Notification text (Notification event only) |
CLAUDE_SESSION_ID | Current session identifier |
Practical Hook Examples
1. Auto-format after file edits
{
"PostToolCall": [
{
"matcher": { "toolName": "(Edit|Write)" },
"hooks": [{
"type": "command",
"command": "prettier --write \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true"
}]
}
]
}
The || true ensures the hook does not fail if prettier is not installed or the file is not a supported type.
2. Block dangerous bash commands
{
"PreToolCall": [
{
"matcher": {
"toolName": "Bash",
"toolInput": {
"command": "(rm -rf /|rm -rf \\.|DROP TABLE|TRUNCATE TABLE|git push.*--force|git reset --hard)"
}
},
"hooks": [{
"type": "command",
"command": "echo 'BLOCKED: This command matches a dangerous pattern. Ask the user for confirmation.' && exit 1"
}]
}
]
}
The agent sees the "BLOCKED" message and adjusts — it will typically ask the user to run the command manually.
3. Log all tool calls to a file
{
"PostToolCall": [
{
"hooks": [{
"type": "command",
"command": "echo \"$(date -Iseconds) $CLAUDE_TOOL_NAME\" >> /tmp/claude-tool-log.txt"
}]
}
]
}
No matcher — fires on every tool call. Useful for auditing and understanding agent behavior patterns.
4. Run tests after code changes
{
"PostToolCall": [
{
"matcher": {
"toolName": "(Edit|Write)",
"toolInput": {
"file_path": "\\.(ts|tsx|js|jsx)$"
}
},
"hooks": [{
"type": "command",
"command": "npm test --silent 2>&1 | tail -5"
}]
}
]
}
Only fires when the agent edits TypeScript or JavaScript files. The tail -5 keeps the output short — the agent sees just the summary line ("Tests: 47 passed, 2 failed").
5. macOS notification on task completion
{
"Stop": [
{
"hooks": [{
"type": "command",
"command": "osascript -e 'display notification \"Agent finished\" with title \"Claude Code\" sound name \"Glass\"'"
}]
}
]
}
Hooks in opencode and forgecode
opencode supports lifecycle hooks via its configuration, but with a simpler model. Hooks are defined in opencode.json:
{
"hooks": {
"post_edit": "npx prettier --write $FILE",
"pre_bash": "echo 'Running: $COMMAND'"
}
}
The event vocabulary is smaller (no regex matchers on tool inputs), but it covers the common use cases.
forgecode supports hooks through .forge.toml event handlers:
[hooks]
post_edit = "prettier --write $FILE_PATH"
on_complete = "say 'Done'"
Simpler than Claude Code's system but functional for basic automation.
Comparison:
| Feature | Claude Code | opencode | forgecode |
|---|---|---|---|
| Config location | settings.json hooks object | opencode.json hooks | .forge.toml [hooks] |
| Event types | 5 (Pre/PostToolCall, Notification, Stop, SubagentStop) | ~3 (pre/post edit, pre bash) | ~3 (post_edit, on_complete, pre_bash) |
| Regex matchers | Yes (toolName, toolInput fields) | No | No |
| Can block actions | Yes (PreToolCall exit 1) | Limited | Limited |
| Environment variables | Full set (CLAUDE_TOOL_*) | Basic ($FILE, $COMMAND) | Basic ($FILE_PATH) |
| Per-subagent hooks | Yes (in agent frontmatter) | No | No |
Claude Code has the most mature hook system. If you need fine-grained lifecycle automation, it is the best option currently. For simple "run formatter after edit" use cases, all three work.
Hooks vs Skills vs MCP
- Hooks: fire automatically, no model involvement, deterministic. Use for formatting, validation, logging, notifications.
- Skills: loaded by the model, require judgment. Use for procedures that need reasoning.
- MCP: provide capabilities the model cannot achieve with local tools. Use for external systems.
Memory and Persistence
LLMs are stateless. Every API call starts with no memory of previous calls. The conversation history you see in a session is maintained by the harness, not the model — it re-sends the entire conversation on each turn. When a session ends, everything the model "learned" during that session is gone.
Memory systems solve this. They give agents a way to persist knowledge across sessions so you do not have to re-explain the same things every time.
Claude Code's Memory System
Claude Code has a built-in, file-based memory system with three scopes:
| Scope | Location | Persists across | Shared with team |
|---|---|---|---|
| User memory | ~/.claude/projects/<project>/memory/ | All sessions in this project | No |
| Project memory | .claude/memory/ | All sessions | If committed (usually no) |
| Local memory | .claude/memory.local/ | Current machine only | No |
Each memory is a Markdown file with YAML frontmatter:
---
name: testing-conventions
description: How tests are organized and run in this project
type: project
---
Test files live next to source files with .test.ts suffix.
Integration tests are in packages/api/tests/integration/.
Use `pnpm test` for unit tests, `pnpm test:integration` for integration tests.
The CI pipeline runs both — never skip integration tests locally before pushing.
The MEMORY.md file in the memory directory acts as an index. It is loaded into every conversation. Each entry is a one-line pointer to a memory file:
- [Testing conventions](testing-conventions.md) — test organization, commands, CI requirements
- [Database gotchas](database-gotchas.md) — port 5433, migration rules, naming conventions
- [User preferences](user-preferences.md) — concise style, no emojis, TypeScript default
Memory types serve different purposes:
| Type | What it stores | Example |
|---|---|---|
user | Who you are, your preferences, your expertise | "Senior backend engineer, new to React frontend" |
feedback | Corrections and confirmed approaches | "Don't mock the database in integration tests" |
project | Ongoing work, decisions, deadlines | "Auth rewrite is compliance-driven, not tech debt" |
reference | Pointers to external resources | "Pipeline bugs tracked in Linear project INGEST" |
The agent reads memories at session start and can create, update, or delete memories during a session. You can explicitly ask it to remember something ("remember that the staging environment uses port 8443") or it may save memories automatically when it learns important project context.
opencode Persistence
opencode persists session history in ~/.config/opencode/sessions/. When you start a new session, the previous session's context is not automatically loaded — but you can reference past sessions.
opencode also supports instructions persistence through opencode.json. Configuration-level facts (instructions, model preferences, permissions) carry across sessions automatically. But there is no equivalent of Claude Code's structured memory system — no types, no MEMORY.md index, no automatic save/retrieve.
For cross-session knowledge, opencode users typically rely on project instructions in opencode.json or AGENTS.md files.
forgecode Memory
forgecode uses AGENTS.md as its primary persistence mechanism. There is no separate memory system. Session state is not persisted beyond the session.
For cross-session knowledge, forgecode users put everything in AGENTS.md and .forge.toml. This is simpler but means there is no distinction between "project facts" and "things learned during previous sessions."
Comparison
| Feature | Claude Code | opencode | forgecode |
|---|---|---|---|
| Structured memory system | Yes (types, index, frontmatter) | No | No |
| Cross-session persistence | Automatic via memory files | Manual (instructions file) | Manual (AGENTS.md) |
| Memory types | user, feedback, project, reference | N/A | N/A |
| Auto-save | Yes (agent decides when to save) | No | No |
| Memory search/retrieval | Index loaded at session start | N/A | N/A |
| Pruning/update | Agent can update or delete memories | Manual file editing | Manual file editing |
The Memory Lifecycle
Effective memory follows a cycle: save → retrieve → update → prune.
Save when the agent learns something that will be useful in future sessions:
- Project conventions not written in docs ("the team prefers functional components over class components")
- User preferences discovered through feedback ("this user wants concise responses")
- Important decisions and their rationale ("we chose PostgreSQL over MongoDB because of transaction requirements")
Retrieve at the start of each session. The agent reads MEMORY.md and relevant memory files. It uses memories as context, not as absolute truth — memories can become stale.
Update when reality changes. If the test command changes from pnpm test to vitest, update the memory. If a convention is abandoned, update or remove the memory.
Prune regularly. Memories accumulate. Old project context ("sprint 12 deadline is March 5") becomes noise. The agent can prune on its own, but periodic manual review helps.
Anti-patterns
- Storing everything. Memory is not a log. Every memory costs tokens at session start. Store only things the agent cannot derive from the code or git history.
- Never pruning. Stale memories are worse than no memories — the agent trusts them and makes wrong decisions based on outdated facts.
- Trusting memory without verification. A memory that says "function
normalizeUseris insrc/utils/users.ts" may be wrong if the file was renamed. The agent should verify memory claims against current code before acting. - Using memory instead of project instructions. If something is always true about the project (test command, coding conventions), put it in
CLAUDE.mdorAGENTS.md. Memory is for things that change or that are personal. Project instructions are for permanent team-shared facts. - Storing code patterns. "We use the repository pattern for database access" is better written as a project instruction. The agent can see the pattern by reading the code. Memory should store the why — "we use the repository pattern because the team decided in Q3 to decouple business logic from the ORM for testability."
LSP Integration
When an agent needs to answer "what calls this function?" or "what type does this variable have?", it typically resorts to grep-and-read: search for the function name, read surrounding lines, infer context. This works but is expensive — each query can consume thousands of tokens in file reads, and the results are imprecise. Grep finds text patterns. It cannot distinguish a function call from a comment, a type annotation from a variable name.
Language Server Protocol (LSP) gives agents the same structural code intelligence that IDEs provide: go-to-definition, find-references, rename-symbol, type information, diagnostics. One LSP query replaces five rounds of grep-and-read.
Language Server Protocol Basics
LSP is a protocol between an editor (the "client") and a language-specific server. The server parses and analyzes code, maintaining a semantic model of the codebase. The client sends requests ("where is this symbol defined?") and receives structured responses.
Every major language has an LSP server: typescript-language-server for TypeScript/JavaScript, pyright for Python, rust-analyzer for Rust, gopls for Go. These are the same servers that power IDE features in VS Code, JetBrains, and Neovim.
The connection between LSP and agent tooling is an MCP bridge: the mcp-language-server project wraps any LSP server as an MCP server, exposing LSP capabilities as tools the agent can call.
The MCP-Language-Server Bridge
The bridge exposes these tools:
| MCP Tool | LSP Capability | What it replaces |
|---|---|---|
get_definition | Go to definition | Agent grepping for class/function declarations |
find_references | Find all references | Agent grepping for function name across all files |
get_diagnostics | Type errors, lint issues | Agent running compiler and parsing output |
rename_symbol | Safe rename across files | Agent doing find-and-replace (misses type-aware renames) |
get_hover | Type information, docs | Agent reading source to infer types |
get_completions | Code completions | Agent guessing based on context |
Token cost comparison. Consider the query "find all callers of normalizeUser()":
Without LSP (grep-and-read):
grep -r "normalizeUser" --include="*.ts"→ reads ~20 matching lines with file paths (~1,500 tokens)- Agent reads 3–5 files to understand call context (~8,000 tokens)
- Total: ~10,000 tokens, multiple tool calls, imprecise (may include comments, string literals, type annotations)
With LSP (one tool call):
find_references("normalizeUser", "src/utils/users.ts", line 42)→ returns structured list of call sites (~500 tokens)- Total: ~500 tokens, one tool call, precise (only actual function calls, not comments or strings)
Setup by Language
Add the MCP-language-server bridge to your project's .claude/settings.json (or equivalent):
TypeScript / JavaScript:
{
"mcpServers": {
"typescript-lsp": {
"command": "mcp-language-server",
"args": [
"--workspace", ".",
"--lsp", "typescript-language-server", "--", "--stdio"
]
}
}
}
Python (pyright):
{
"mcpServers": {
"python-lsp": {
"command": "mcp-language-server",
"args": [
"--workspace", ".",
"--lsp", "pyright-langserver", "--", "--stdio"
]
}
}
}
Rust (rust-analyzer):
{
"mcpServers": {
"rust-lsp": {
"command": "mcp-language-server",
"args": [
"--workspace", ".",
"--lsp", "rust-analyzer"
]
}
}
}
Prerequisite: the language server must be installed on the system (npm install -g typescript-language-server, pip install pyright, rustup component add rust-analyzer).
Platform Access
Claude Code accesses LSP through the MCP server configuration shown above. The agent sees the LSP tools alongside other MCP tools and uses them when relevant.
opencode can use the same MCP-language-server bridge via its mcpServers configuration in opencode.json. The setup is identical — opencode speaks MCP natively.
forgecode supports MCP servers through .mcp.json, using the same configuration format. The bridge works the same way.
All three tools treat LSP as "just another MCP server." The agent does not need to know it is talking to a language server — it sees tools like get_definition and find_references and uses them when they are more efficient than grep.
Limitations
- Setup overhead. You need the language server installed and the MCP bridge configured per-language. This is a one-time cost but it is not zero.
- Startup time. Language servers need to index the codebase before they can answer queries. For large projects, the first query may take several seconds.
- Dynamic languages. Python and JavaScript have weaker type inference than TypeScript, Rust, or Go.
find_referenceson a Python function may miss dynamically-dispatched calls.get_hovermay showAnyinstead of a concrete type. - Memory usage. Language servers keep a semantic model of the entire codebase in memory. For very large monorepos, this can consume significant RAM.
- Not all languages have mature servers. Shell scripts, configuration files, and DSLs typically do not have LSP servers.
Despite these limitations, LSP integration is one of the highest-leverage improvements you can make to an agent's code understanding. The token savings compound across every turn that would otherwise require grep-and-read exploration.
Codebase Indexing
LSP gives you symbol-level intelligence within a language. Codebase indexing goes further: it builds a structural model of the entire codebase — across files, across languages — and answers architectural queries that no single language server can.
Structural Understanding vs Text Search
Questions grep cannot answer reliably:
- "What modules import
userService?" — grep finds the string, but cannot distinguishimport userServicefrom// removed userServiceorconst userServiceMock. - "If I change this function's signature, what breaks?" — grep cannot compute a blast radius.
- "Is this function dead code?" — grep can check for references, but misses dynamic dispatch, reflection, and framework-specific patterns.
- "What is the class hierarchy for
BaseRepository?" — grep can findextends BaseRepositorybut not build the full tree, especially with multiple inheritance levels.
AST-based indexing solves this by parsing source code into syntax trees and building a queryable graph of symbols, imports, call sites, and type relationships.
Tree-sitter AST Parsing
Tree-sitter is an incremental parser generator used by most codebase indexing tools. It provides:
- Grammar support for 66+ languages — one parser handles TypeScript, Python, Rust, Go, Java, C#, Ruby, and dozens more
- Incremental parsing — re-parses only changed regions, making it fast for large codebases
- Concrete syntax trees — every token is represented, enabling precise structural queries
When an indexing server starts, it runs tree-sitter over the codebase, builds syntax trees, extracts symbols (functions, classes, imports, exports), and stores the relationships in an index. Queries run against the index, not the source files — which is why they are fast and token-efficient.
Available Indexing Servers
Two MCP servers are commonly used for codebase indexing:
codebase-memory-mcp (DeusData) — builds a persistent knowledge graph:
- Indexes symbols, dependencies, and call chains using tree-sitter
- Stores the graph persistently (survives server restarts)
- Key tools:
search_symbols,get_dependencies,get_call_chain,get_file_summary - Claim: 83% answer quality at 10x fewer tokens compared to file exploration
jcodemunch-mcp — focused on structural queries:
- Key tools:
find_importers(who imports this module),get_blast_radius(what breaks if this changes),get_class_hierarchy(inheritance tree),find_dead_code(unreferenced exports) - Lighter weight than codebase-memory-mcp — no persistent graph, re-indexes on demand
Both are configured as MCP servers in .claude/settings.json or equivalent.
Token Cost Comparison
Worked example: "Find all callers of normalizeUser() and determine if changing its return type would break anything."
Approach 1: Grep-based exploration
| Step | Action | Token cost |
|---|---|---|
| 1 | grep -r "normalizeUser" --include="*.ts" | ~1,500 (results) |
| 2 | Read 5 files containing matches | ~10,000 (file content) |
| 3 | Agent reasons about each call site | ~2,000 (model output) |
| 4 | Repeat for indirect callers | ~8,000 (more reads) |
| Total | ~21,500 tokens, 8+ tool calls |
Approach 2: Codebase indexing
| Step | Action | Token cost |
|---|---|---|
| 1 | get_blast_radius("normalizeUser", "src/utils/users.ts") | ~800 (structured result) |
| 2 | Agent reviews the structured dependency list | ~500 (model output) |
| Total | ~1,300 tokens, 1 tool call |
The indexing approach uses 16x fewer tokens and gives a more reliable answer because it understands imports and call chains, not just string matches.
When to Use Indexing vs LSP vs Grep
| Need | Best tool | Why |
|---|---|---|
| Find a string in files | Grep | Fastest, simplest, no setup |
| Go to definition / find references for a symbol | LSP | Type-aware, precise, language-specific |
| "Who imports this module?" | Indexing | Cross-file structural query |
| "What breaks if I change this?" | Indexing | Blast radius requires dependency graph |
| "Show me the class hierarchy" | Indexing | Multi-level inheritance traversal |
| Quick filename search | Glob | Pattern matching, no parsing needed |
| Type information for a variable | LSP | Semantic analysis, not text search |
The three approaches are complementary. Grep is always available and costs nothing to set up. LSP adds type-aware intelligence for one language at a time. Indexing adds cross-file structural understanding across the entire codebase. Use the simplest tool that answers your question.
Progressive Discovery
Every tool definition costs tokens. Every skill description costs tokens. Every file the agent reads costs tokens. The context window is finite. If you dump everything upfront — all tool schemas, all skill bodies, all project documentation — you exhaust the budget before the agent starts working.
Progressive discovery is the pattern that solves this: provide minimal metadata upfront, load full details only when needed.
The Information Overload Problem
A typical well-configured coding environment might have:
- 15 built-in tools (~500 tokens each = 7,500 tokens)
- 20 MCP tools from 3 servers (~1,000 tokens each = 20,000 tokens)
- 10 skills (~100 tokens discovery each = 1,000 tokens)
- Project instructions in CLAUDE.md (~2,000 tokens)
That is 30,500 tokens of overhead before the first user message. On a 200K context window, 15% of the budget is spent on tool definitions alone. On a 128K window, it is 24%.
Now scale it. An enterprise environment with 50 MCP tools, 30 skills, and detailed project instructions can consume 80,000+ tokens at startup. That leaves barely enough room for a meaningful conversation.
The Progressive Discovery Pattern
The solution is a three-level hierarchy:
Level 1 — Existence. The agent knows something exists and roughly what it does. Cost: ~50–100 tokens per item.
Level 2 — Schema. The agent loads the full interface: parameter names, types, constraints, examples. Cost: ~500–1,400 tokens per item. Only loaded when the agent considers using the item.
Level 3 — Content. The actual data: a file's full contents, a skill's complete instructions, a resource's data. Cost: varies widely. Only loaded when the agent commits to using the item.
Instances Across the Stack
This pattern appears everywhere in the agent tooling ecosystem:
MCP tools/list — returns tool names and descriptions (Level 1). The full JSON Schema for each tool loads when the model selects it (Level 2). Tool execution returns actual data (Level 3). In practice, most clients load all schemas at startup, defeating the pattern. Anthropic's server-side tool_search is a fix: it keeps schemas out of the prompt and dynamically selects relevant tools per query.
Claude Code's ToolSearch — deferred tools are registered by name only. Their schemas are not loaded until the agent calls ToolSearch to fetch them. This is pure Level 1 → Level 2 progressive loading.
Skills — the description field loads at session start (~100 tokens, Level 1). The full SKILL.md body loads only when the agent decides the skill is relevant (~2–5K tokens, Level 2). Reference files and scripts load only during execution (Level 3). This is why lecture 28 emphasizes writing precise descriptions — they are the Level 1 filter.
MCP Resources — resources/list returns URIs and metadata (Level 1). resources/read fetches actual content (Level 3). The agent can browse what is available without loading everything into context.
Codebase indexing — symbol index provides names and locations (Level 1). Querying a specific symbol returns its relationships and context (Level 2). Reading the actual source file loads full content (Level 3).
Design Principles
If you build MCP servers, skills, or other agent-facing systems, design for progressive discovery:
- Expose metadata cheaply. Names and descriptions should be short and precise. The model reads all of them — keep each under 100 tokens.
- Make discovery queries cheap. A list operation should return just enough to decide, not everything. Paginate large result sets.
- Defer expensive content. Full file contents, detailed schemas, large data sets — load them only when the model commits to using them.
- Design descriptions for selection. The description is a filter. If it is vague, the model either loads everything (wasting tokens) or loads nothing (missing relevant content). Same principle as skill descriptions and subagent descriptions.
Context Window Management
Everything in this lecture competes for the same resource: the context window. Tool definitions, skill descriptions, project instructions, memories, conversation history, tool results — all of it occupies tokens in a finite budget. When the budget runs out, the harness must choose what to keep and what to discard. Understanding this budget and managing it deliberately is what separates productive sessions from sessions that degrade after 15 turns.
The Context Budget
Think of the context window as a bank account with a fixed balance. Every turn makes deposits (model output, tool results) and the balance never grows.
Concrete budget for a typical Claude Code session on Claude Opus 4.6 (1M context):
| Component | Tokens | % of 200K effective window |
|---|---|---|
| System prompt (built-in) | ~8,000 | 4% |
| Tool definitions (15 built-in) | ~7,500 | 4% |
| MCP tool definitions (20 tools from 3 servers) | ~20,000 | 10% |
| Project instructions (CLAUDE.md) | ~2,000 | 1% |
| Skill descriptions (10 skills) | ~1,000 | 0.5% |
| Memories (MEMORY.md + active memories) | ~1,500 | 0.75% |
| Static overhead | ~40,000 | 20% |
| Available for conversation | ~160,000 | 80% |
Note: even with a 1M token model, the effective working window is often smaller because prompt caching works best within the first 200K tokens. The static overhead (40K tokens) is re-sent on every turn.
After 30 turns of active coding (reading files, running commands, discussing changes), the conversation history can easily reach 150K+ tokens. At that point, 95% of the effective window is consumed.
Compaction
When the context window fills, the harness must compress older turns to make room. This process is called compaction (Claude Code) or summarization (opencode, Codex).
How it works:
- The harness identifies old turns that are unlikely to be needed
- It sends them to a model for summarization
- The detailed turns are replaced with a condensed summary
- The conversation continues with the summary in place of the original turns
This is lossy. Details from early in the session — specific error messages, exact file contents, nuanced explanations — may be lost. The agent may "forget" things you discussed 20 turns ago.
Claude Code triggers compaction automatically when the context reaches ~80% capacity. You can also trigger it manually with the /compact command to proactively summarize before the window fills involuntarily.
opencode has a hidden Compaction agent (a built-in subagent) that handles summarization automatically. It runs on a smaller model to minimize cost.
forgecode handles compaction through its conversation management layer, similar in principle to Claude Code's approach.
Prompt Caching
Anthropic's API supports prompt caching: the static prefix of the prompt (system message, tool definitions, project instructions) is cached server-side for 5 minutes. Subsequent turns that share the same prefix get a cache hit — 90% cheaper input tokens and significantly faster response times.
Implications for your workflow:
- Keep the static prefix stable. Don't add/remove MCP servers mid-session. Don't edit CLAUDE.md during a session. Changes invalidate the cache.
- The 5-minute TTL matters. If you pause for more than 5 minutes between turns, the cache expires and the next turn pays full price for the entire prefix.
- Fewer tools = faster cache warmup. A 40K-token static prefix caches just as easily as an 8K one, but the cache miss penalty is 5x higher.
Practical Techniques
Seven strategies for extending the useful life of a session:
-
Start fresh sessions frequently. The cheapest form of context management is a new session. If the task shifts — different feature, different part of the codebase — start a new conversation. The context is clean and the cache warms immediately.
-
Use
/compactproactively. Don't wait for automatic compaction. After a major milestone (feature implemented, bug fixed), run/compactto summarize the history. You control the timing instead of the harness choosing a possibly worse moment. -
Minimize tool count. Disable MCP servers you are not using. Each unused tool definition wastes ~1,000 tokens per turn. If you configured a Playwright server for testing but are now writing backend code, disable it for this session.
-
Write concise project instructions. Every word in
CLAUDE.mdis paid for on every turn. A 5,000-token instruction file costs 200K tokens across a 40-turn session. Cut aggressively. -
Use skills instead of MCP where possible. A skill costs ~100 tokens at discovery. An MCP tool costs ~1,000 tokens per turn. If the capability can be expressed as instructions + a script (no runtime state needed), a skill is 10x cheaper.
-
Have the agent write to files instead of returning large outputs inline. "Write the analysis to
analysis.mdand tell me the summary" keeps the large output out of the context window. "Analyze and show me everything" puts it all in the conversation. -
Front-load the most important information. If you know what you need, say it upfront. "Fix the failing test in
auth.test.ts— the error isTypeError: Cannot read property 'token' of undefinedon line 47" gives the agent everything it needs in one turn. Drip-feeding context over many turns wastes window space on back-and-forth.
Token Budgeting Across a Session
Worked example: 200K effective context, 15 MCP tools, typical coding session.
| Turn | Activity | Cumulative tokens | Remaining |
|---|---|---|---|
| 0 | Session start (static overhead) | 40,000 | 160,000 |
| 1–5 | Exploration (file reads, grep results) | 75,000 | 125,000 |
| 6–10 | Implementation (edits, test runs) | 110,000 | 90,000 |
| 11–15 | Debugging (error logs, more reads) | 145,000 | 55,000 |
| 16–20 | More implementation | 170,000 | 30,000 |
| ~20 | Compaction triggers | ~100,000 | 100,000 |
| 21–30 | Continue with summarized history | 160,000 | 40,000 |
| ~30 | Second compaction | ~100,000 | 100,000 |
Each compaction recovers ~60–70K tokens but loses detail from earlier turns. After two compactions, the agent has a high-level summary of the session but may not remember specific error messages or exact code snippets from the first 10 turns.
This is why lecture 30's subagent pattern matters: by delegating verbose exploration to subagents, the parent's context stays lean and compaction happens later (or not at all).
References
Hooks
- Claude Code hooks documentation: https://code.claude.com/docs/en/hooks
- Awesome Claude Code (hooks collection): https://github.com/hesreallyhim/awesome-claude-code
- Inside Claude Code architecture (hooks section): https://www.penligent.ai/hackinglabs/inside-claude-code-the-architecture-behind-tools-memory-hooks-and-mcp/
LSP and Code Intelligence
- mcp-language-server: https://github.com/isaacphi/mcp-language-server
- Language Server Protocol specification: https://microsoft.github.io/language-server-protocol/
- typescript-language-server: https://github.com/typescript-language-server/typescript-language-server
- pyright: https://github.com/microsoft/pyright
- rust-analyzer: https://rust-analyzer.github.io/
Codebase Indexing
- codebase-memory-mcp: https://github.com/DeusData/codebase-memory-mcp
- jcodemunch-mcp: https://github.com/jgravelle/jcodemunch-mcp
- tree-sitter: https://tree-sitter.github.io/tree-sitter/
Memory
- Claude Code memory documentation: https://code.claude.com/docs/en/memory
- Claude Code context window: https://code.claude.com/docs/en/context-window
Project Instructions
- CLAUDE.md documentation: https://code.claude.com/docs/en/claude-md
- AGENTS.md (OpenAI Codex): https://developers.openai.com/codex/guides/agents-md
- opencode configuration: https://opencode.ai/docs/config/
- forgecode documentation: https://forgecode.dev/docs/
Context Management
- Anthropic prompt caching: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
- Claude Code context window visualization: https://code.claude.com/docs/en/context-window
- Claude Code compaction: https://code.claude.com/docs/en/agent-sdk/agent-loop#automatic-compaction https://platform.claude.com/docs/en/build-with-claude/compaction
Progressive Discovery
- MCP specification: https://modelcontextprotocol.io
- Agent Skills specification: https://agentskills.io/specification
- Anthropic tool_search: https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview