Skip to main content

29 - Misc

Lectures 26–28 gave you the building blocks: tool calling, MCP, skills. Lecture 30 covers the orchestration layer — subagents. This lecture covers everything in between: the configuration, automation, and intelligence layers that make the whole system work well in practice.

You will learn how to tell agents what to do before they start (project instructions), how to automate their behavior (hooks), how to give them structural code understanding (LSP, codebase indexing), how they remember across sessions (memory), and how to manage the finite context window that everything competes for (context management). These are the topics that separate a working agent from a productive one.

Project Instructions

Every coding agent supports a way to inject persistent instructions into every conversation. These are not prompts you type each time — they are files checked into your repo (or placed in your home directory) that the agent reads automatically at session start. They tell the agent things it cannot learn from the code alone: which test runner to use, which conventions to follow, which directories are off-limits, what the deployment process looks like.

The problem: every tool invented its own format. If your team uses multiple tools, you need to know which file each one reads.

The Instruction File Ecosystem

FileRead byScope
CLAUDE.mdClaude CodeAnthropic's agents
AGENTS.mdOpenAI Codex, forgecodeOpenAI ecosystem + compatible tools
AGENTS.override.mdOpenAI CodexPer-folder overrides (replaces, not extends)
.cursorrulesCursorCursor IDE only
.github/copilot-instructions.mdGitHub CopilotCopilot in VS Code and GitHub
.kilocode/rules/*.mdKilo CodeKilo Code modes (code, architect, etc.)
opencode.json (instructions field)opencodeopencode sessions

Most projects that want broad tool support maintain both CLAUDE.md and AGENTS.md. The content is largely the same — project facts are tool-agnostic — but the files are read by different harnesses.

Layering and Precedence

Instructions cascade from general to specific. More specific files override or extend more general ones.

Claude Code reads three layers:

  1. ~/.claude/CLAUDE.md — user-level, applies to all projects (your personal preferences)
  2. CLAUDE.md at the project root — project-level, shared with the team
  3. CLAUDE.md in any subdirectory — folder-level, scoped to that subtree

All three are concatenated. Folder-level instructions add to project-level, they do not replace. If there is a conflict, the most specific file wins in practice because the model sees it last.

# ~/.claude/CLAUDE.md (user-level)
- I prefer concise responses without trailing summaries.
- Default to TypeScript for new files unless the project uses another language.
- Run tests before committing.
# CLAUDE.md (project root)
- This is a monorepo: packages/api (Express + TypeScript), packages/web (Next.js), packages/shared (common types).
- Use pnpm, not npm. The lock file is pnpm-lock.yaml.
- Test command: pnpm test. Lint: pnpm lint.
- Database is PostgreSQL. Migrations are in packages/api/prisma/migrations.
- Never modify migration files after they have been applied.
# packages/api/CLAUDE.md (folder-level)
- Use snake_case for database column names, camelCase for TypeScript properties.
- All endpoints require authentication middleware except /health and /auth/*.
- Use Zod for request validation, not manual type guards.

OpenAI Codex reads a similar chain:

  1. ~/.codex/AGENTS.md — user-level
  2. AGENTS.md at project root — project-level
  3. AGENTS.md in subdirectories — folder-level (extends parent)
  4. AGENTS.override.md in subdirectories — folder-level (replaces parent, not extends)

The AGENTS.override.md distinction matters. If a subfolder has fundamentally different conventions (e.g., a Python data pipeline inside a TypeScript monorepo), you want to replace the parent instructions, not stack on top of them.

opencode reads opencode.json at the project root. Instructions are set via the instructions field (a string or array of strings). User-level config lives in ~/.config/opencode/config.json. There is no folder-level override — it is flat.

forgecode reads AGENTS.md at the project root (following the Codex convention) plus .forge.toml for configuration. No folder-level cascading.

What to Put Where

Not everything belongs in the same file. The layering exists for a reason:

User-level (your personal ~/.claude/CLAUDE.md or ~/.codex/AGENTS.md):

  • Response style preferences ("be concise", "no emojis", "no trailing summaries")
  • Default language and tooling choices when the project doesn't specify
  • Personal workflow habits ("always run tests before committing")

Project-level (root CLAUDE.md / AGENTS.md):

  • Project architecture and directory layout
  • Build, test, lint, and deploy commands
  • Coding conventions (naming, patterns, libraries to use or avoid)
  • Database and infrastructure facts
  • Links to relevant documentation or dashboards

Folder-level (subdirectory CLAUDE.md / AGENTS.md):

  • Conventions specific to that package or module
  • Exceptions to project-wide rules
  • Module-specific gotchas

Anti-pattern: putting workflows in project instructions. If you find yourself writing step-by-step procedures ("when deploying, first run X, then Y, then check Z"), extract those into skills instead. Project instructions are for facts about the project. Skills are for reusable procedures. Lecture 28 covers this distinction.

Writing Effective Instructions

The agent reads your instructions on every turn. Every unnecessary sentence costs tokens across the entire session. Be direct:

  • Be imperative. "Use pnpm" not "We generally prefer to use pnpm when possible."
  • Be specific. "Test command: pnpm test" not "Make sure to run the appropriate tests."
  • Include gotchas. "The dev database runs on port 5433, not the default 5432" saves a debugging session.
  • Skip what the model knows. Don't explain what TypeScript is. Don't describe how git works. Describe what is unique to this project.
  • Reference lecture 28's description advice. The same principles apply: concrete triggers, explicit scope, no filler words.

What to Commit vs Gitignore

FileCommit?Why
CLAUDE.mdYesTeam-shared project facts
AGENTS.mdYesTeam-shared project facts
.claude/settings.jsonYesTeam-shared tool config (MCP servers, allowed tools)
.claude/settings.local.jsonNoPersonal overrides, may contain paths or API keys
.claude/memory/NoPersonal memory, session-specific
.cursorrulesYesTeam-shared Cursor conventions
opencode.jsonYesTeam-shared opencode config
.forge.tomlYesTeam-shared forgecode config

The rule: if it encodes team decisions, commit it. If it encodes personal preferences or ephemeral state, gitignore it.

Configuration and Settings

Beyond instruction files, each tool has a configuration system that controls permissions, MCP servers, model selection, and behavior. Understanding where configuration lives prevents the "works on my machine" problem.

Claude Code Settings

Claude Code reads settings from three locations, merged in order (later overrides earlier):

  1. User settings: ~/.claude/settings.json — personal defaults across all projects
  2. Project settings: .claude/settings.json — shared team configuration, committed to git
  3. Local settings: .claude/settings.local.json — personal overrides, gitignored
{
"permissions": {
"allow": [
"Read",
"Glob",
"Grep",
"Bash(npm test)",
"Bash(npm run lint)",
"Bash(npm run build)"
],
"deny": [
"Bash(rm -rf *)",
"Bash(git push --force)"
]
},
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"]
}
},
"hooks": {}
}

The permissions.allow array uses exact match or glob-style patterns. Bash(npm test) allows exactly that command. Bash(npm *) allows any npm command. The deny array takes precedence — a command matching both allow and deny is denied.

opencode Configuration

opencode uses opencode.json at the project root:

{
"instructions": [
"Use pnpm for package management.",
"Run pnpm test before committing."
],
"provider": {
"default": "anthropic"
},
"model": {
"default": "claude-sonnet-4-6"
},
"permissions": {
"Bash": "ask",
"Write": "allow",
"Read": "allow"
},
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
}
}
}

Permissions are per-tool with three modes: "allow" (always run), "ask" (prompt the user), "deny" (never run). You can also set per-command permissions for Bash using regex patterns.

User-level config lives in ~/.config/opencode/config.json and follows the same schema.

forgecode Configuration

forgecode uses TOML for its primary configuration:

# .forge.toml
[model]
provider = "anthropic"
name = "claude-sonnet-4-6"

[permissions]
allow_bash = true
auto_approve = ["Read", "Glob", "Grep"]
require_approval = ["Write", "Bash"]

[mcp]
config_path = ".mcp.json"

MCP server configuration is typically in a separate .mcp.json file (shared JSON format compatible with Claude Code and other tools).

Permission Models Compared

AspectClaude Codeopencodeforgecode
Permission granularityPer-tool + per-command regexPer-tool, per-command regexPer-tool categories
Default behaviorAsk for everythingAsk for Bash, allow readsConfigurable default
Deny listExplicit deny arrayPer-tool "deny" modedeny list in TOML
Allow patternsGlob: Bash(npm *)Regex on command stringCategory-based
MCP tool permissionsVia allowedTools in agent frontmatterPer-server tool filteringPer-server config
User vs project overridesettings.local.json overridesUser config extends projectUser config extends project

The practical takeaway: all three let you pre-approve safe operations (read, search, test) and require confirmation for destructive ones (write, delete, bash). The syntax differs but the model is the same.

Hooks

Hooks are shell commands that the harness executes automatically when specific events occur — before a tool call, after a file edit, when the agent stops. They are not agent code. The model does not decide to run them. The harness fires them deterministically based on event matching rules.

Think of them as git hooks, but for agent actions instead of git operations.

What Hooks Are

A hook has three parts:

  1. Event — when it fires (before a tool call, after a tool call, on notification, on stop)
  2. Matcher — which tool calls it applies to (optional — omit to match all)
  3. Command — the shell command to execute

The harness runs the command, captures its output, and optionally blocks the agent action if the hook exits with a non-zero status code. The agent sees hook feedback as system messages — it can read "hook blocked this action because..." and adjust its behavior.

Claude Code Hooks in Detail

Hooks are configured in settings.json (any of the three layers: user, project, local):

{
"hooks": {
"PreToolCall": [
{
"matcher": {
"toolName": "Bash",
"toolInput": {
"command": "(rm -rf|DROP TABLE|TRUNCATE|git push --force)"
}
},
"hooks": [
{
"type": "command",
"command": "echo 'BLOCKED: Dangerous command detected' && exit 1"
}
]
}
],
"PostToolCall": [
{
"matcher": {
"toolName": "(Edit|Write)"
},
"hooks": [
{
"type": "command",
"command": "npx eslint --fix \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true"
}
]
}
],
"Notification": [
{
"hooks": [
{
"type": "command",
"command": "osascript -e 'display notification \"$CLAUDE_NOTIFICATION\" with title \"Claude Code\"'"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "say 'Task complete'"
}
]
}
]
}
}

Event types:

EventWhen it firesCan block?
PreToolCallBefore a tool call executesYes — exit 1 cancels the call
PostToolCallAfter a tool call completesNo — action already happened
NotificationWhen the agent emits a notificationNo
StopWhen the agent finishes its turnNo
SubagentStopWhen a subagent completesNo

Matcher fields:

  • toolName — regex matched against the tool name ("Bash", "Edit", "Write", "(Edit|Write)")
  • toolInput — object whose keys are tool input field names and values are regex patterns. For Bash, match against command. For Edit/Write, match against file_path.

Environment variables available inside hook commands:

VariableContent
CLAUDE_TOOL_NAMEName of the tool being called
CLAUDE_TOOL_INPUT_*Tool input fields, uppercased (e.g., CLAUDE_TOOL_INPUT_COMMAND for Bash)
CLAUDE_TOOL_RESULTTool output (PostToolCall only)
CLAUDE_NOTIFICATIONNotification text (Notification event only)
CLAUDE_SESSION_IDCurrent session identifier

Practical Hook Examples

1. Auto-format after file edits

{
"PostToolCall": [
{
"matcher": { "toolName": "(Edit|Write)" },
"hooks": [{
"type": "command",
"command": "prettier --write \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true"
}]
}
]
}

The || true ensures the hook does not fail if prettier is not installed or the file is not a supported type.

2. Block dangerous bash commands

{
"PreToolCall": [
{
"matcher": {
"toolName": "Bash",
"toolInput": {
"command": "(rm -rf /|rm -rf \\.|DROP TABLE|TRUNCATE TABLE|git push.*--force|git reset --hard)"
}
},
"hooks": [{
"type": "command",
"command": "echo 'BLOCKED: This command matches a dangerous pattern. Ask the user for confirmation.' && exit 1"
}]
}
]
}

The agent sees the "BLOCKED" message and adjusts — it will typically ask the user to run the command manually.

3. Log all tool calls to a file

{
"PostToolCall": [
{
"hooks": [{
"type": "command",
"command": "echo \"$(date -Iseconds) $CLAUDE_TOOL_NAME\" >> /tmp/claude-tool-log.txt"
}]
}
]
}

No matcher — fires on every tool call. Useful for auditing and understanding agent behavior patterns.

4. Run tests after code changes

{
"PostToolCall": [
{
"matcher": {
"toolName": "(Edit|Write)",
"toolInput": {
"file_path": "\\.(ts|tsx|js|jsx)$"
}
},
"hooks": [{
"type": "command",
"command": "npm test --silent 2>&1 | tail -5"
}]
}
]
}

Only fires when the agent edits TypeScript or JavaScript files. The tail -5 keeps the output short — the agent sees just the summary line ("Tests: 47 passed, 2 failed").

5. macOS notification on task completion

{
"Stop": [
{
"hooks": [{
"type": "command",
"command": "osascript -e 'display notification \"Agent finished\" with title \"Claude Code\" sound name \"Glass\"'"
}]
}
]
}

Hooks in opencode and forgecode

opencode supports lifecycle hooks via its configuration, but with a simpler model. Hooks are defined in opencode.json:

{
"hooks": {
"post_edit": "npx prettier --write $FILE",
"pre_bash": "echo 'Running: $COMMAND'"
}
}

The event vocabulary is smaller (no regex matchers on tool inputs), but it covers the common use cases.

forgecode supports hooks through .forge.toml event handlers:

[hooks]
post_edit = "prettier --write $FILE_PATH"
on_complete = "say 'Done'"

Simpler than Claude Code's system but functional for basic automation.

Comparison:

FeatureClaude Codeopencodeforgecode
Config locationsettings.json hooks objectopencode.json hooks.forge.toml [hooks]
Event types5 (Pre/PostToolCall, Notification, Stop, SubagentStop)~3 (pre/post edit, pre bash)~3 (post_edit, on_complete, pre_bash)
Regex matchersYes (toolName, toolInput fields)NoNo
Can block actionsYes (PreToolCall exit 1)LimitedLimited
Environment variablesFull set (CLAUDE_TOOL_*)Basic ($FILE, $COMMAND)Basic ($FILE_PATH)
Per-subagent hooksYes (in agent frontmatter)NoNo

Claude Code has the most mature hook system. If you need fine-grained lifecycle automation, it is the best option currently. For simple "run formatter after edit" use cases, all three work.

Hooks vs Skills vs MCP

  • Hooks: fire automatically, no model involvement, deterministic. Use for formatting, validation, logging, notifications.
  • Skills: loaded by the model, require judgment. Use for procedures that need reasoning.
  • MCP: provide capabilities the model cannot achieve with local tools. Use for external systems.

Memory and Persistence

LLMs are stateless. Every API call starts with no memory of previous calls. The conversation history you see in a session is maintained by the harness, not the model — it re-sends the entire conversation on each turn. When a session ends, everything the model "learned" during that session is gone.

Memory systems solve this. They give agents a way to persist knowledge across sessions so you do not have to re-explain the same things every time.

Claude Code's Memory System

Claude Code has a built-in, file-based memory system with three scopes:

ScopeLocationPersists acrossShared with team
User memory~/.claude/projects/<project>/memory/All sessions in this projectNo
Project memory.claude/memory/All sessionsIf committed (usually no)
Local memory.claude/memory.local/Current machine onlyNo

Each memory is a Markdown file with YAML frontmatter:

---
name: testing-conventions
description: How tests are organized and run in this project
type: project
---

Test files live next to source files with .test.ts suffix.
Integration tests are in packages/api/tests/integration/.
Use `pnpm test` for unit tests, `pnpm test:integration` for integration tests.
The CI pipeline runs both — never skip integration tests locally before pushing.

The MEMORY.md file in the memory directory acts as an index. It is loaded into every conversation. Each entry is a one-line pointer to a memory file:

- [Testing conventions](testing-conventions.md) — test organization, commands, CI requirements
- [Database gotchas](database-gotchas.md) — port 5433, migration rules, naming conventions
- [User preferences](user-preferences.md) — concise style, no emojis, TypeScript default

Memory types serve different purposes:

TypeWhat it storesExample
userWho you are, your preferences, your expertise"Senior backend engineer, new to React frontend"
feedbackCorrections and confirmed approaches"Don't mock the database in integration tests"
projectOngoing work, decisions, deadlines"Auth rewrite is compliance-driven, not tech debt"
referencePointers to external resources"Pipeline bugs tracked in Linear project INGEST"

The agent reads memories at session start and can create, update, or delete memories during a session. You can explicitly ask it to remember something ("remember that the staging environment uses port 8443") or it may save memories automatically when it learns important project context.

opencode Persistence

opencode persists session history in ~/.config/opencode/sessions/. When you start a new session, the previous session's context is not automatically loaded — but you can reference past sessions.

opencode also supports instructions persistence through opencode.json. Configuration-level facts (instructions, model preferences, permissions) carry across sessions automatically. But there is no equivalent of Claude Code's structured memory system — no types, no MEMORY.md index, no automatic save/retrieve.

For cross-session knowledge, opencode users typically rely on project instructions in opencode.json or AGENTS.md files.

forgecode Memory

forgecode uses AGENTS.md as its primary persistence mechanism. There is no separate memory system. Session state is not persisted beyond the session.

For cross-session knowledge, forgecode users put everything in AGENTS.md and .forge.toml. This is simpler but means there is no distinction between "project facts" and "things learned during previous sessions."

Comparison

FeatureClaude Codeopencodeforgecode
Structured memory systemYes (types, index, frontmatter)NoNo
Cross-session persistenceAutomatic via memory filesManual (instructions file)Manual (AGENTS.md)
Memory typesuser, feedback, project, referenceN/AN/A
Auto-saveYes (agent decides when to save)NoNo
Memory search/retrievalIndex loaded at session startN/AN/A
Pruning/updateAgent can update or delete memoriesManual file editingManual file editing

The Memory Lifecycle

Effective memory follows a cycle: save → retrieve → update → prune.

Save when the agent learns something that will be useful in future sessions:

  • Project conventions not written in docs ("the team prefers functional components over class components")
  • User preferences discovered through feedback ("this user wants concise responses")
  • Important decisions and their rationale ("we chose PostgreSQL over MongoDB because of transaction requirements")

Retrieve at the start of each session. The agent reads MEMORY.md and relevant memory files. It uses memories as context, not as absolute truth — memories can become stale.

Update when reality changes. If the test command changes from pnpm test to vitest, update the memory. If a convention is abandoned, update or remove the memory.

Prune regularly. Memories accumulate. Old project context ("sprint 12 deadline is March 5") becomes noise. The agent can prune on its own, but periodic manual review helps.

Anti-patterns

  • Storing everything. Memory is not a log. Every memory costs tokens at session start. Store only things the agent cannot derive from the code or git history.
  • Never pruning. Stale memories are worse than no memories — the agent trusts them and makes wrong decisions based on outdated facts.
  • Trusting memory without verification. A memory that says "function normalizeUser is in src/utils/users.ts" may be wrong if the file was renamed. The agent should verify memory claims against current code before acting.
  • Using memory instead of project instructions. If something is always true about the project (test command, coding conventions), put it in CLAUDE.md or AGENTS.md. Memory is for things that change or that are personal. Project instructions are for permanent team-shared facts.
  • Storing code patterns. "We use the repository pattern for database access" is better written as a project instruction. The agent can see the pattern by reading the code. Memory should store the why — "we use the repository pattern because the team decided in Q3 to decouple business logic from the ORM for testability."

LSP Integration

When an agent needs to answer "what calls this function?" or "what type does this variable have?", it typically resorts to grep-and-read: search for the function name, read surrounding lines, infer context. This works but is expensive — each query can consume thousands of tokens in file reads, and the results are imprecise. Grep finds text patterns. It cannot distinguish a function call from a comment, a type annotation from a variable name.

Language Server Protocol (LSP) gives agents the same structural code intelligence that IDEs provide: go-to-definition, find-references, rename-symbol, type information, diagnostics. One LSP query replaces five rounds of grep-and-read.

Language Server Protocol Basics

LSP is a protocol between an editor (the "client") and a language-specific server. The server parses and analyzes code, maintaining a semantic model of the codebase. The client sends requests ("where is this symbol defined?") and receives structured responses.

Every major language has an LSP server: typescript-language-server for TypeScript/JavaScript, pyright for Python, rust-analyzer for Rust, gopls for Go. These are the same servers that power IDE features in VS Code, JetBrains, and Neovim.

The connection between LSP and agent tooling is an MCP bridge: the mcp-language-server project wraps any LSP server as an MCP server, exposing LSP capabilities as tools the agent can call.

The MCP-Language-Server Bridge

The bridge exposes these tools:

MCP ToolLSP CapabilityWhat it replaces
get_definitionGo to definitionAgent grepping for class/function declarations
find_referencesFind all referencesAgent grepping for function name across all files
get_diagnosticsType errors, lint issuesAgent running compiler and parsing output
rename_symbolSafe rename across filesAgent doing find-and-replace (misses type-aware renames)
get_hoverType information, docsAgent reading source to infer types
get_completionsCode completionsAgent guessing based on context

Token cost comparison. Consider the query "find all callers of normalizeUser()":

Without LSP (grep-and-read):

  1. grep -r "normalizeUser" --include="*.ts" → reads ~20 matching lines with file paths (~1,500 tokens)
  2. Agent reads 3–5 files to understand call context (~8,000 tokens)
  3. Total: ~10,000 tokens, multiple tool calls, imprecise (may include comments, string literals, type annotations)

With LSP (one tool call):

  1. find_references("normalizeUser", "src/utils/users.ts", line 42) → returns structured list of call sites (~500 tokens)
  2. Total: ~500 tokens, one tool call, precise (only actual function calls, not comments or strings)

Setup by Language

Add the MCP-language-server bridge to your project's .claude/settings.json (or equivalent):

TypeScript / JavaScript:

{
"mcpServers": {
"typescript-lsp": {
"command": "mcp-language-server",
"args": [
"--workspace", ".",
"--lsp", "typescript-language-server", "--", "--stdio"
]
}
}
}

Python (pyright):

{
"mcpServers": {
"python-lsp": {
"command": "mcp-language-server",
"args": [
"--workspace", ".",
"--lsp", "pyright-langserver", "--", "--stdio"
]
}
}
}

Rust (rust-analyzer):

{
"mcpServers": {
"rust-lsp": {
"command": "mcp-language-server",
"args": [
"--workspace", ".",
"--lsp", "rust-analyzer"
]
}
}
}

Prerequisite: the language server must be installed on the system (npm install -g typescript-language-server, pip install pyright, rustup component add rust-analyzer).

Platform Access

Claude Code accesses LSP through the MCP server configuration shown above. The agent sees the LSP tools alongside other MCP tools and uses them when relevant.

opencode can use the same MCP-language-server bridge via its mcpServers configuration in opencode.json. The setup is identical — opencode speaks MCP natively.

forgecode supports MCP servers through .mcp.json, using the same configuration format. The bridge works the same way.

All three tools treat LSP as "just another MCP server." The agent does not need to know it is talking to a language server — it sees tools like get_definition and find_references and uses them when they are more efficient than grep.

Limitations

  • Setup overhead. You need the language server installed and the MCP bridge configured per-language. This is a one-time cost but it is not zero.
  • Startup time. Language servers need to index the codebase before they can answer queries. For large projects, the first query may take several seconds.
  • Dynamic languages. Python and JavaScript have weaker type inference than TypeScript, Rust, or Go. find_references on a Python function may miss dynamically-dispatched calls. get_hover may show Any instead of a concrete type.
  • Memory usage. Language servers keep a semantic model of the entire codebase in memory. For very large monorepos, this can consume significant RAM.
  • Not all languages have mature servers. Shell scripts, configuration files, and DSLs typically do not have LSP servers.

Despite these limitations, LSP integration is one of the highest-leverage improvements you can make to an agent's code understanding. The token savings compound across every turn that would otherwise require grep-and-read exploration.

Codebase Indexing

LSP gives you symbol-level intelligence within a language. Codebase indexing goes further: it builds a structural model of the entire codebase — across files, across languages — and answers architectural queries that no single language server can.

Questions grep cannot answer reliably:

  • "What modules import userService?" — grep finds the string, but cannot distinguish import userService from // removed userService or const userServiceMock.
  • "If I change this function's signature, what breaks?" — grep cannot compute a blast radius.
  • "Is this function dead code?" — grep can check for references, but misses dynamic dispatch, reflection, and framework-specific patterns.
  • "What is the class hierarchy for BaseRepository?" — grep can find extends BaseRepository but not build the full tree, especially with multiple inheritance levels.

AST-based indexing solves this by parsing source code into syntax trees and building a queryable graph of symbols, imports, call sites, and type relationships.

Tree-sitter AST Parsing

Tree-sitter is an incremental parser generator used by most codebase indexing tools. It provides:

  • Grammar support for 66+ languages — one parser handles TypeScript, Python, Rust, Go, Java, C#, Ruby, and dozens more
  • Incremental parsing — re-parses only changed regions, making it fast for large codebases
  • Concrete syntax trees — every token is represented, enabling precise structural queries

When an indexing server starts, it runs tree-sitter over the codebase, builds syntax trees, extracts symbols (functions, classes, imports, exports), and stores the relationships in an index. Queries run against the index, not the source files — which is why they are fast and token-efficient.

Available Indexing Servers

Two MCP servers are commonly used for codebase indexing:

codebase-memory-mcp (DeusData) — builds a persistent knowledge graph:

  • Indexes symbols, dependencies, and call chains using tree-sitter
  • Stores the graph persistently (survives server restarts)
  • Key tools: search_symbols, get_dependencies, get_call_chain, get_file_summary
  • Claim: 83% answer quality at 10x fewer tokens compared to file exploration

jcodemunch-mcp — focused on structural queries:

  • Key tools: find_importers (who imports this module), get_blast_radius (what breaks if this changes), get_class_hierarchy (inheritance tree), find_dead_code (unreferenced exports)
  • Lighter weight than codebase-memory-mcp — no persistent graph, re-indexes on demand

Both are configured as MCP servers in .claude/settings.json or equivalent.

Token Cost Comparison

Worked example: "Find all callers of normalizeUser() and determine if changing its return type would break anything."

Approach 1: Grep-based exploration

StepActionToken cost
1grep -r "normalizeUser" --include="*.ts"~1,500 (results)
2Read 5 files containing matches~10,000 (file content)
3Agent reasons about each call site~2,000 (model output)
4Repeat for indirect callers~8,000 (more reads)
Total~21,500 tokens, 8+ tool calls

Approach 2: Codebase indexing

StepActionToken cost
1get_blast_radius("normalizeUser", "src/utils/users.ts")~800 (structured result)
2Agent reviews the structured dependency list~500 (model output)
Total~1,300 tokens, 1 tool call

The indexing approach uses 16x fewer tokens and gives a more reliable answer because it understands imports and call chains, not just string matches.

When to Use Indexing vs LSP vs Grep

NeedBest toolWhy
Find a string in filesGrepFastest, simplest, no setup
Go to definition / find references for a symbolLSPType-aware, precise, language-specific
"Who imports this module?"IndexingCross-file structural query
"What breaks if I change this?"IndexingBlast radius requires dependency graph
"Show me the class hierarchy"IndexingMulti-level inheritance traversal
Quick filename searchGlobPattern matching, no parsing needed
Type information for a variableLSPSemantic analysis, not text search

The three approaches are complementary. Grep is always available and costs nothing to set up. LSP adds type-aware intelligence for one language at a time. Indexing adds cross-file structural understanding across the entire codebase. Use the simplest tool that answers your question.

Progressive Discovery

Every tool definition costs tokens. Every skill description costs tokens. Every file the agent reads costs tokens. The context window is finite. If you dump everything upfront — all tool schemas, all skill bodies, all project documentation — you exhaust the budget before the agent starts working.

Progressive discovery is the pattern that solves this: provide minimal metadata upfront, load full details only when needed.

The Information Overload Problem

A typical well-configured coding environment might have:

  • 15 built-in tools (~500 tokens each = 7,500 tokens)
  • 20 MCP tools from 3 servers (~1,000 tokens each = 20,000 tokens)
  • 10 skills (~100 tokens discovery each = 1,000 tokens)
  • Project instructions in CLAUDE.md (~2,000 tokens)

That is 30,500 tokens of overhead before the first user message. On a 200K context window, 15% of the budget is spent on tool definitions alone. On a 128K window, it is 24%.

Now scale it. An enterprise environment with 50 MCP tools, 30 skills, and detailed project instructions can consume 80,000+ tokens at startup. That leaves barely enough room for a meaningful conversation.

The Progressive Discovery Pattern

The solution is a three-level hierarchy:

Level 1 — Existence. The agent knows something exists and roughly what it does. Cost: ~50–100 tokens per item.

Level 2 — Schema. The agent loads the full interface: parameter names, types, constraints, examples. Cost: ~500–1,400 tokens per item. Only loaded when the agent considers using the item.

Level 3 — Content. The actual data: a file's full contents, a skill's complete instructions, a resource's data. Cost: varies widely. Only loaded when the agent commits to using the item.

Instances Across the Stack

This pattern appears everywhere in the agent tooling ecosystem:

MCP tools/list — returns tool names and descriptions (Level 1). The full JSON Schema for each tool loads when the model selects it (Level 2). Tool execution returns actual data (Level 3). In practice, most clients load all schemas at startup, defeating the pattern. Anthropic's server-side tool_search is a fix: it keeps schemas out of the prompt and dynamically selects relevant tools per query.

Claude Code's ToolSearch — deferred tools are registered by name only. Their schemas are not loaded until the agent calls ToolSearch to fetch them. This is pure Level 1 → Level 2 progressive loading.

Skills — the description field loads at session start (~100 tokens, Level 1). The full SKILL.md body loads only when the agent decides the skill is relevant (~2–5K tokens, Level 2). Reference files and scripts load only during execution (Level 3). This is why lecture 28 emphasizes writing precise descriptions — they are the Level 1 filter.

MCP Resourcesresources/list returns URIs and metadata (Level 1). resources/read fetches actual content (Level 3). The agent can browse what is available without loading everything into context.

Codebase indexing — symbol index provides names and locations (Level 1). Querying a specific symbol returns its relationships and context (Level 2). Reading the actual source file loads full content (Level 3).

Design Principles

If you build MCP servers, skills, or other agent-facing systems, design for progressive discovery:

  1. Expose metadata cheaply. Names and descriptions should be short and precise. The model reads all of them — keep each under 100 tokens.
  2. Make discovery queries cheap. A list operation should return just enough to decide, not everything. Paginate large result sets.
  3. Defer expensive content. Full file contents, detailed schemas, large data sets — load them only when the model commits to using them.
  4. Design descriptions for selection. The description is a filter. If it is vague, the model either loads everything (wasting tokens) or loads nothing (missing relevant content). Same principle as skill descriptions and subagent descriptions.

Context Window Management

Everything in this lecture competes for the same resource: the context window. Tool definitions, skill descriptions, project instructions, memories, conversation history, tool results — all of it occupies tokens in a finite budget. When the budget runs out, the harness must choose what to keep and what to discard. Understanding this budget and managing it deliberately is what separates productive sessions from sessions that degrade after 15 turns.

The Context Budget

Think of the context window as a bank account with a fixed balance. Every turn makes deposits (model output, tool results) and the balance never grows.

Concrete budget for a typical Claude Code session on Claude Opus 4.6 (1M context):

ComponentTokens% of 200K effective window
System prompt (built-in)~8,0004%
Tool definitions (15 built-in)~7,5004%
MCP tool definitions (20 tools from 3 servers)~20,00010%
Project instructions (CLAUDE.md)~2,0001%
Skill descriptions (10 skills)~1,0000.5%
Memories (MEMORY.md + active memories)~1,5000.75%
Static overhead~40,00020%
Available for conversation~160,00080%

Note: even with a 1M token model, the effective working window is often smaller because prompt caching works best within the first 200K tokens. The static overhead (40K tokens) is re-sent on every turn.

After 30 turns of active coding (reading files, running commands, discussing changes), the conversation history can easily reach 150K+ tokens. At that point, 95% of the effective window is consumed.

Compaction

When the context window fills, the harness must compress older turns to make room. This process is called compaction (Claude Code) or summarization (opencode, Codex).

How it works:

  1. The harness identifies old turns that are unlikely to be needed
  2. It sends them to a model for summarization
  3. The detailed turns are replaced with a condensed summary
  4. The conversation continues with the summary in place of the original turns

This is lossy. Details from early in the session — specific error messages, exact file contents, nuanced explanations — may be lost. The agent may "forget" things you discussed 20 turns ago.

Claude Code triggers compaction automatically when the context reaches ~80% capacity. You can also trigger it manually with the /compact command to proactively summarize before the window fills involuntarily.

opencode has a hidden Compaction agent (a built-in subagent) that handles summarization automatically. It runs on a smaller model to minimize cost.

forgecode handles compaction through its conversation management layer, similar in principle to Claude Code's approach.

Prompt Caching

Anthropic's API supports prompt caching: the static prefix of the prompt (system message, tool definitions, project instructions) is cached server-side for 5 minutes. Subsequent turns that share the same prefix get a cache hit — 90% cheaper input tokens and significantly faster response times.

Implications for your workflow:

  • Keep the static prefix stable. Don't add/remove MCP servers mid-session. Don't edit CLAUDE.md during a session. Changes invalidate the cache.
  • The 5-minute TTL matters. If you pause for more than 5 minutes between turns, the cache expires and the next turn pays full price for the entire prefix.
  • Fewer tools = faster cache warmup. A 40K-token static prefix caches just as easily as an 8K one, but the cache miss penalty is 5x higher.

Practical Techniques

Seven strategies for extending the useful life of a session:

  1. Start fresh sessions frequently. The cheapest form of context management is a new session. If the task shifts — different feature, different part of the codebase — start a new conversation. The context is clean and the cache warms immediately.

  2. Use /compact proactively. Don't wait for automatic compaction. After a major milestone (feature implemented, bug fixed), run /compact to summarize the history. You control the timing instead of the harness choosing a possibly worse moment.

  3. Minimize tool count. Disable MCP servers you are not using. Each unused tool definition wastes ~1,000 tokens per turn. If you configured a Playwright server for testing but are now writing backend code, disable it for this session.

  4. Write concise project instructions. Every word in CLAUDE.md is paid for on every turn. A 5,000-token instruction file costs 200K tokens across a 40-turn session. Cut aggressively.

  5. Use skills instead of MCP where possible. A skill costs ~100 tokens at discovery. An MCP tool costs ~1,000 tokens per turn. If the capability can be expressed as instructions + a script (no runtime state needed), a skill is 10x cheaper.

  6. Have the agent write to files instead of returning large outputs inline. "Write the analysis to analysis.md and tell me the summary" keeps the large output out of the context window. "Analyze and show me everything" puts it all in the conversation.

  7. Front-load the most important information. If you know what you need, say it upfront. "Fix the failing test in auth.test.ts — the error is TypeError: Cannot read property 'token' of undefined on line 47" gives the agent everything it needs in one turn. Drip-feeding context over many turns wastes window space on back-and-forth.

Token Budgeting Across a Session

Worked example: 200K effective context, 15 MCP tools, typical coding session.

TurnActivityCumulative tokensRemaining
0Session start (static overhead)40,000160,000
1–5Exploration (file reads, grep results)75,000125,000
6–10Implementation (edits, test runs)110,00090,000
11–15Debugging (error logs, more reads)145,00055,000
16–20More implementation170,00030,000
~20Compaction triggers~100,000100,000
21–30Continue with summarized history160,00040,000
~30Second compaction~100,000100,000

Each compaction recovers ~60–70K tokens but loses detail from earlier turns. After two compactions, the agent has a high-level summary of the session but may not remember specific error messages or exact code snippets from the first 10 turns.

This is why lecture 30's subagent pattern matters: by delegating verbose exploration to subagents, the parent's context stays lean and compaction happens later (or not at all).

References

Hooks

LSP and Code Intelligence

Codebase Indexing

Memory

Project Instructions

Context Management

Progressive Discovery