27 - MCP
MCP (Model Context Protocol) is a standardized protocol for exposing tools, resources, and prompts to LLM agents. Anthropic released it late 2024, open-sourced the spec in December 2025. As of early 2026, it's adopted by 26+ platforms: Claude Code, Cursor, VS Code/Copilot, OpenAI Codex, Gemini CLI, Amp, Goose, and others. The spec is maintained at modelcontextprotocol.io under Linux Foundation governance.
This lecture covers what MCP actually is (and isn't), when to use it, what goes wrong at scale, and which servers matter for software development.
What MCP Is
MCP is a JSON-RPC 2.0 protocol over two transports:
- stdio — server is a local subprocess, communicates via stdin/stdout. Used for local tools (filesystem, language servers, code indexing).
- Streamable HTTP — server is a remote service, communicates via HTTP with optional SSE for streaming. Used for SaaS integrations (GitHub, Slack, databases).
A server exposes three primitive types:
- Tools — callable functions with JSON Schema inputs (same as regular tool calling, but discovered via protocol)
- Resources — read-only data the agent can pull into context (files, database records, config)
- Prompts — reusable prompt templates the agent can invoke
The lifecycle:
- Client connects to server
- Client calls
tools/list→ server returns tool schemas - Client calls
resources/list→ server returns available resources - During conversation, model decides to call a tool → client sends
tools/callto server → server executes and returns result - Client feeds result back to model
That's it. MCP is a discovery + execution protocol for tools. Under the hood, the model still sees JSON Schema tool definitions and does regular tool calling. MCP standardizes how tools are found and invoked, not how the model interacts with them.
What MCP Is NOT
MCP is not responsible for context bloat. Context bloat comes from function calling — the requirement to include tool schemas in every API call so the model can choose. You'd have this problem with or without MCP. MCP just makes it easier to add more tools, which makes the bloat problem worse in practice.
MCP is not an execution engine. It's a protocol. The server implements execution. A bad MCP server wrapping a bad API is still bad.
MCP is not a security boundary. Tool calls go through the protocol, but there's no built-in auth, sandboxing, or permission model in the base spec. OAuth support exists but is still maturing. The 2026 roadmap lists enterprise auth as a priority.
The Good
Write Once, Run Everywhere
Before MCP: you build a GitHub integration for Claude Code. It doesn't work in Cursor. You rebuild it. Doesn't work in Copilot. Rebuild again.
After MCP: you build one MCP server. Any MCP-compatible client can use it. Microsoft Playwright MCP works in Claude Code, Cursor, VS Code, Codex CLI, and Gemini CLI — same server, same config.
Composability
You can run multiple servers simultaneously. A coding session might have:
- Filesystem access (built-in)
- GitHub (issue tracking, PR management)
- Context7 (library documentation)
- Playwright (browser testing)
- A custom server for your internal API
Each server exposes its tools independently. The model sees all of them and can chain across servers.
Progressive Discovery
MCP servers don't dump everything at startup. Well-designed servers implement progressive disclosure:
tools/listreturns lightweight metadata (name + description)- Detailed schemas load only when a tool is selected
- Resources are fetched on-demand, not pre-loaded
This is the theory. In practice, most clients load all tool schemas at startup. Anthropic's tool_search server-side tool is an attempt to fix this — it dynamically discovers which tools are relevant per query instead of injecting all schemas.
Ecosystem Momentum
The MCP server ecosystem is large and growing:
- Official servers from Anthropic, Microsoft, GitHub, Google
- Partner servers from Atlassian (Jira/Confluence), Figma, Stripe, Notion, Zapier
- Community servers numbering in thousands on registries like PulseMCP, Smithery, Docker MCP Catalog
The Bad
Context Window Bloat — The Real Problem
This is the #1 engineering challenge. Each MCP tool definition costs 550–1,400 tokens (name + description + JSON Schema + field descriptions + enums). Connect three servers with a total of 40 tools:
~55,000 tokens consumed before the model reads a single user message.
That's 25-30% of Claude's 200K context window. Gone.
Concrete measurement from Apideck: checking a repo's language consumed 1,365 tokens via CLI and 44,026 tokens via MCP — the overhead was almost entirely schema injection for 43 tool definitions, of which the agent used one.
The problem compounds:
- Tool definitions are re-sent on every API call (they're part of the system prompt)
- Tool results accumulate in conversation history
- Multi-turn tool-use conversations grow rapidly
- Prompt caching helps (Anthropic caches static prefixes) but doesn't eliminate the cost
Tool Selection Degrades With Scale
Models struggle when presented with too many similar tools:
- Misfiring:
get_status,fetch_status, andquery_statusin the same prompt → model picks wrong one - Freezing: too many ambiguous options → model takes no action at all
- Hallucinating: model invents plausible tool names like
create_lead_entrywhen the actual name isadd_sales_contact
Microsoft Research confirms that LLMs decline to act when faced with ambiguous or excessive tool options. The practical limit is 15-30 tools before degradation becomes noticeable. Most MCP servers ship with 10-25 tools each. Three servers = you're already at the limit.
Most MCP Servers Are Bad
The honest truth: most MCP servers are thin wrappers around REST APIs. They convert each API endpoint into a separate tool, one-to-one. This is the worst possible design for an LLM:
- 50 endpoints → 50 tool definitions → massive context overhead
- Tool descriptions are auto-generated from OpenAPI specs → not optimized for LLM comprehension
- No thought given to which operations an agent actually needs
Good MCP server design requires agentic thinking: what tasks does the model need to accomplish? Which operations compose well? How can you expose fewer, higher-level tools that cover the common workflows?
Security Is Underspecified
The base MCP spec has no auth model. OAuth was added as an extension. The 2026 roadmap lists enterprise readiness (audit trails, SSO, gateway behavior) as a priority — meaning it doesn't exist yet.
Implications for your code:
- MCP servers can execute arbitrary code
- Prompt injection through tool results is a real risk (server returns data that contains instructions the model follows)
- No standard way to restrict which tools a server can expose
- No standard audit log of tool invocations
MCP vs API vs CLI
There's a hot debate in 2026 about whether MCP is even the right approach. Three alternatives exist:
Direct API Calls (via code execution)
The model writes code that calls REST APIs directly:
import requests
response = requests.get("https://api.github.com/repos/owner/repo/issues")
issues = response.json()
Pros: No schema overhead. Model already knows common APIs from training data. Full programmatic control (loops, conditionals, error handling in code).
Cons: No schema validation. No tool discovery. Security nightmare — the model can call anything. No standardized way to expose custom APIs.
CLI Tools (via bash)
The model runs command-line tools:
gh issue list --repo owner/repo --state open
Pros: Extremely token-efficient. Models know common CLIs (git, curl, gh, aws) from training data — zero schema overhead. Progressive discovery via --help. Structural safety baked into the binary (the CLI validates its own arguments).
Cons: Only works for tools the model already knows. Custom CLIs require the model to learn usage from --help output (costs tokens too, just deferred). No structured output guarantee. Error handling is primitive (exit codes + stderr). Useless for remote/SaaS integrations where no CLI exists.
Concrete comparison (Apideck benchmark):
- Checking a repo's language: CLI = 1,365 tokens, MCP = 44,026 tokens
- But on multi-step tasks (create invoice with line items): CLI = 19 LLM round trips, MCP = 12, Code Mode MCP = 4
MCP
Pros: Standardized discovery and execution. Works for both local and remote tools. Schema validation. Ecosystem compatibility. Telemetry and observability. Auth can be centralized.
Cons: Schema bloat. Extra infrastructure (server processes). Still maturing (auth, scaling, enterprise features).
When to Use What
| Scenario | Best approach |
|---|---|
Well-known CLI tools (git, gh, docker, aws) | CLI via bash |
| Custom internal APIs | MCP server (or CLI if token budget is tight) |
| SaaS integrations (GitHub, Slack, Jira) | MCP server |
| Data processing over many records | Code execution calling APIs directly |
| Tight token budget, simple operations | CLI |
| Team-wide standardized tooling | MCP (observability, auth, consistency) |
| Solo vibe-coding | CLI usually wins on efficiency |
The right answer is usually a mix. The Playwright team explicitly says: use CLI+Skills for coding agents (token-efficient), use MCP for exploratory automation (persistent state, rich introspection).
MCP Servers That Matter for Development
Context7 — Library Documentation
Problem: LLM training data is stale. You ask about Next.js 15 and get Next.js 13 patterns.
Solution: Context7 fetches current, version-specific documentation from official sources and injects it into context.
Tools: resolve-library-id (find library), query-docs (fetch relevant docs)
Usage: Add use context7 to your prompt, or configure automatic invocation.
{
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
}
}
}
Why it's good: Only 2 tools (minimal schema overhead). Progressive — resolves library first, then fetches only relevant docs. Supports thousands of libraries. Works across all major coding agents. Currently the most widely adopted MCP server for coding.
Playwright — Headless Browser
Problem: Agents can't interact with web pages for testing, scraping, or verification.
Solution: Microsoft's Playwright MCP exposes browser automation via accessibility tree (not screenshots — no vision model needed).
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"]
}
}
}
Why it matters: Deterministic element references (not pixel coordinates). Supports Chromium, Firefox, WebKit. Can run headed (you watch) or headless (CI).
The catch: 25+ tools = significant context overhead (~14K-20K tokens). Microsoft themselves now recommend CLI+Skills over MCP for coding agents where token efficiency matters. MCP is better for exploratory/interactive automation where persistent browser state is valuable.
Codebase Indexing — Structural Code Search
Problem: Agents explore codebases by grep and reading entire files. This is slow and token-expensive. "Brute-force reading" wastes context on irrelevant code.
Solutions (several competing approaches):
codebase-memory-mcp (DeusData) — Indexes code into a persistent knowledge graph using tree-sitter AST analysis. 66 languages. Queries return specific symbols, call chains, dependency graphs. Claims 83% answer quality at 10x fewer tokens than file exploration. Single static binary, zero dependencies.
jcodemunch-mcp — Tree-sitter based indexing with structural queries: find_importers, get_blast_radius, get_class_hierarchy, find_dead_code. These are questions grep literally cannot answer. Independent A/B test showed 15-25% token savings at the tool layer.
Why this category matters: The #1 token cost in coding agents is reading code. Any tool that lets the agent query for specific symbols instead of reading entire files saves significant tokens. This is the "better aim, not bigger context window" philosophy.
Language Server MCP — LSP Bridge
Problem: Agents don't have IDE-grade code intelligence. They can't do go-to-definition, find-references, or get diagnostics without reading and parsing code themselves.
Solution: mcp-language-server bridges existing LSP servers (gopls, rust-analyzer, pyright, typescript-language-server) to MCP. The agent gets actual compiler-grade intelligence.
{
"mcpServers": {
"language-server": {
"command": "mcp-language-server",
"args": [
"--workspace", "/path/to/project",
"--lsp", "typescript-language-server", "--", "--stdio"
]
}
}
}
Tools: get_definition, find_references, get_diagnostics, rename_symbol, get_hover
Why it matters: Instead of the agent pattern-matching on imports and guessing where functions are defined, it gets the same information your IDE has. This is especially valuable for statically typed languages where the LSP provides complete, accurate navigation.
The catch: Requires the language server to be installed. Setup is per-language. Adds a running process per workspace.
GitHub MCP
Official server for GitHub operations: issue management, PR workflows, code search, repository administration. Used by GitHub Copilot natively.
Postgres/SQLite MCP
Database access. Be very careful with write permissions. Always use read-only mode unless you have a specific, audited reason.
Strategies for Managing Context Bloat
1. Limit Active Tools
Don't enable every MCP server for every conversation. Context7 is useful when writing code against external libraries. Playwright is useful when testing. You rarely need both simultaneously.
In Claude Code: claude mcp add / claude mcp remove per project.
2. Tool Search (Anthropic)
Anthropic's tool_search is a server-side tool that replaces loading all tool schemas upfront. Instead:
- Only
tool_searchis in the context (~100 tokens) - Model describes what it needs
tool_searchreturns only the relevant tool definitions- Model calls the discovered tools
This moves from O(n) schema cost to O(1) + O(k) where k << n.
3. Design Better Servers
Instead of exposing 50 endpoints:
- Group related operations into meta-tools
- One
github_manage_issuestool with aactionparameter instead of separatecreate_issue,update_issue,list_issues,close_issue - Return minimal, relevant data — not entire API responses
4. Programmatic Tool Calling (Code Mode)
Let the model write code that orchestrates MCP tools:
# Instead of 5 separate tool calls through conversation:
issues = mcp.github.list_issues(state="open")
for issue in issues:
labels = mcp.github.get_labels(issue_id=issue["id"])
if "bug" in labels:
mcp.github.assign(issue_id=issue["id"], assignee="bot")
One inference pass, one code block, many tool calls. Anthropic introduced this pattern and reports 58% fewer tokens than raw MCP on multi-step tasks.
5. Use CLIs Where Possible
For tools the model already knows (git, gh, docker, npm, curl):
# 0 tokens of schema overhead — model knows git from training
git log --oneline -10
vs.
// ~800 tokens of schema + ~200 tokens of result framing
{"name": "git_log", "input": {"limit": 10, "format": "oneline"}}
Mix MCP for custom/SaaS tools with CLI for well-known tools.
Industry Trends (Early 2026)
The CLI Counter-Argument
A recurring argument: CLIs are better than MCP for coding agents. This is partially true — CLIs genuinely are more token-efficient for known tools. But CLIs don't solve:
- Custom API integration without a pre-existing CLI
- Team-wide observability (which tools are agents using? how often? what fails?)
- Standardized auth and permissions
- Remote tool execution (you can't
npxa SaaS)
The pendulum will settle. MCP for standardized/remote/enterprise. CLI for well-known local tools. Both in the same agent.
The Spec Is Maturing
The 2026 MCP roadmap priorities:
- Streamable HTTP improvements — horizontal scaling, stateless sessions, load balancer compatibility
- Elicitation — servers can ask the user for input (not just the model)
- Task lifecycle — retry semantics, expiry policies for long-running operations
- Enterprise readiness — audit trails, SSO, gateway behavior (expected as extensions, not core spec changes)
Codebase Intelligence Is The Next Frontier
The explosion of code indexing MCP servers (codebase-memory, jcodemunch, Serena, CodeGraphContext, and dozens more) signals that the industry recognizes: agents reading files line-by-line is not going to scale. The future is pre-indexed, queryable code graphs that provide structural answers (call chains, dependency impacts, dead code) at a fraction of the token cost.
Agent Skills vs MCP
The Agent Skills specification (https://agentskills.io) exists alongside MCP. They solve different problems:
| MCP | Agent Skills | |
|---|---|---|
| What it does | Connects to external tools and data | Teaches the agent how to perform tasks |
| Analogy | USB port | Employee handbook |
| Runtime | Server process, JSON-RPC calls | Instruction file, prompt injection |
| Token cost | High (schemas loaded per-call) | Low (~100 tokens discovery, ~5K activation) |
| State | Can maintain state (browser sessions, DB connections) | Stateless (instructions only) |
You use both. MCP gives the agent capabilities. Skills tell the agent how to use them well.
Setting Up MCP in Claude Code
# Add a stdio server
claude mcp add context7 -- npx -y @upstash/context7-mcp
# Add a remote HTTP server
claude mcp add github --transport http https://api.github.com/mcp
# List active servers
claude mcp list
# Remove a server
claude mcp remove context7
# Check context usage
/context
Project-level config goes in .claude/settings.json:
{
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest", "--headless"]
}
}
}
Writing a Simple MCP Server
You don't just consume MCP servers — you can write your own. A minimal stdio server in Python using the official SDK:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("student-lookup")
@mcp.tool()
def search_students(query: str, search_type: str) -> str:
"""Search the student database by name or student code.
Use when the user asks about a specific student."""
# Your actual database logic here
return f"Found: {query} ({search_type})"
if __name__ == "__main__":
mcp.run(transport="stdio")
Run with: pip install mcp[cli] then python server.py — or configure directly in Claude Code:
{
"mcpServers": {
"student-lookup": {
"command": "python",
"args": ["server.py"]
}
}
}
The SDK handles JSON-RPC protocol, tool schema generation (from type hints and docstrings), and transport. You write Python functions; the SDK exposes them as MCP tools.
For TypeScript, the pattern is similar with @modelcontextprotocol/sdk. See the MCP quickstart for both languages.
The key design decision: expose a few high-level tools that map to agent tasks, not one tool per database query. A search_students tool is better than separate query_by_name, query_by_code, query_by_course tools.
References
Specification and Roadmap
- MCP Specification: https://modelcontextprotocol.io
- 2026 Roadmap: https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/
- GitHub repo: https://github.com/modelcontextprotocol
Context Bloat Problem
- "Your MCP Server Is Eating Your Context Window" (Apideck): https://www.apideck.com/blog/mcp-server-eating-context-window-cli-alternative
- "10 Strategies to Reduce MCP Token Bloat" (The New Stack): https://thenewstack.io/how-to-reduce-mcp-token-bloat/
- "Ballooning Context in the MCP Era" (CodeRabbit): https://www.coderabbit.ai/blog/handling-ballooning-context-in-the-mcp-era-context-engineering-on-steroids
- "How to Prevent MCP Tool Overload" (Lunar): https://www.lunar.dev/post/why-is-there-mcp-tool-overload-and-how-to-solve-it-for-your-ai-agents
- "MCP is Dead; Long Live MCP" (balanced take on CLI vs MCP): https://chrlschn.dev/blog/2026/03/mcp-is-dead-long-live-mcp/
Specific Servers
- Context7: https://github.com/upstash/context7
- Playwright MCP: https://github.com/microsoft/playwright-mcp
- mcp-language-server (LSP bridge): https://github.com/isaacphi/mcp-language-server
- codebase-memory-mcp (code indexing): https://github.com/DeusData/codebase-memory-mcp
- jcodemunch-mcp (tree-sitter indexing): https://github.com/jgravelle/jcodemunch-mcp
Server Registries
- PulseMCP: https://pulsemcp.com
- Smithery: https://smithery.ai
- Docker MCP Catalog: https://hub.docker.com/mcp
Academic
- "Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP" (arXiv 2603.27277) — 83% quality at 10x fewer tokens vs file exploration
- "RAG-MCP" (Writer): https://writer.com/engineering/rag-mcp/ — vector-based tool retrieval triples selection accuracy, halves prompt tokens
Broader Context
- "Why the Model Context Protocol Does Not Work" (critical perspective): https://www.epicai.pro/why-the-model-context-protocol-does-not-work-hgsz5
- Anthropic Advanced Tool Use (programmatic calling): https://www.anthropic.com/engineering/advanced-tool-use