27 - MCP

MCP (Model Context Protocol) is a standardized protocol for exposing tools, resources, and prompts to LLM agents. Anthropic released it late 2024, open-sourced the spec in December 2025. As of early 2026, it's adopted by 26+ platforms: Claude Code, Cursor, VS Code/Copilot, OpenAI Codex, Gemini CLI, Amp, Goose, and others. The spec is maintained at modelcontextprotocol.io under Linux Foundation governance.

This lecture covers what MCP actually is (and isn't), when to use it, what goes wrong at scale, and which servers matter for software development.

What MCP Is

MCP is a JSON-RPC 2.0 protocol over two transports:

stdio — server is a local subprocess, communicates via stdin/stdout. Used for local tools (filesystem, language servers, code indexing).
Streamable HTTP — server is a remote service, communicates via HTTP with optional SSE for streaming. Used for SaaS integrations (GitHub, Slack, databases).

A server exposes three primitive types:

Tools — callable functions with JSON Schema inputs (same as regular tool calling, but discovered via protocol)
Resources — read-only data the agent can pull into context (files, database records, config)
Prompts — reusable prompt templates the agent can invoke

The lifecycle:

Client connects to server
Client calls tools/list → server returns tool schemas
Client calls resources/list → server returns available resources
During conversation, model decides to call a tool → client sends tools/call to server → server executes and returns result
Client feeds result back to model

That's it. MCP is a discovery + execution protocol for tools. Under the hood, the model still sees JSON Schema tool definitions and does regular tool calling. MCP standardizes how tools are found and invoked, not how the model interacts with them.

What MCP Is NOT

MCP is not responsible for context bloat. Context bloat comes from function calling — the requirement to include tool schemas in every API call so the model can choose. You'd have this problem with or without MCP. MCP just makes it easier to add more tools, which makes the bloat problem worse in practice.

MCP is not an execution engine. It's a protocol. The server implements execution. A bad MCP server wrapping a bad API is still bad.

MCP is not a security boundary. Tool calls go through the protocol, but there's no built-in auth, sandboxing, or permission model in the base spec. OAuth support exists but is still maturing. The 2026 roadmap lists enterprise auth as a priority.

The Good

Write Once, Run Everywhere

Before MCP: you build a GitHub integration for Claude Code. It doesn't work in Cursor. You rebuild it. Doesn't work in Copilot. Rebuild again.

After MCP: you build one MCP server. Any MCP-compatible client can use it. Microsoft Playwright MCP works in Claude Code, Cursor, VS Code, Codex CLI, and Gemini CLI — same server, same config.

Composability

You can run multiple servers simultaneously. A coding session might have:

Filesystem access (built-in)
GitHub (issue tracking, PR management)
Context7 (library documentation)
Playwright (browser testing)
A custom server for your internal API

Each server exposes its tools independently. The model sees all of them and can chain across servers.

Progressive Discovery

MCP servers don't dump everything at startup. Well-designed servers implement progressive disclosure:

tools/list returns lightweight metadata (name + description)
Detailed schemas load only when a tool is selected
Resources are fetched on-demand, not pre-loaded

This is the theory. In practice, most clients load all tool schemas at startup. Anthropic's tool_search server-side tool is an attempt to fix this — it dynamically discovers which tools are relevant per query instead of injecting all schemas.

Ecosystem Momentum

The MCP server ecosystem is large and growing:

Official servers from Anthropic, Microsoft, GitHub, Google
Partner servers from Atlassian (Jira/Confluence), Figma, Stripe, Notion, Zapier
Community servers numbering in thousands on registries like PulseMCP, Smithery, Docker MCP Catalog

The Bad

Context Window Bloat — The Real Problem

This is the #1 engineering challenge. Each MCP tool definition costs 550–1,400 tokens (name + description + JSON Schema + field descriptions + enums). Connect three servers with a total of 40 tools:

~55,000 tokens consumed before the model reads a single user message.

That's 25-30% of Claude's 200K context window. Gone.

Concrete measurement from Apideck: checking a repo's language consumed 1,365 tokens via CLI and 44,026 tokens via MCP — the overhead was almost entirely schema injection for 43 tool definitions, of which the agent used one.

The problem compounds:

Tool definitions are re-sent on every API call (they're part of the system prompt)
Tool results accumulate in conversation history
Multi-turn tool-use conversations grow rapidly
Prompt caching helps (Anthropic caches static prefixes) but doesn't eliminate the cost

Tool Selection Degrades With Scale

Models struggle when presented with too many similar tools:

Misfiring: get_status, fetch_status, and query_status in the same prompt → model picks wrong one
Freezing: too many ambiguous options → model takes no action at all
Hallucinating: model invents plausible tool names like create_lead_entry when the actual name is add_sales_contact

Microsoft Research confirms that LLMs decline to act when faced with ambiguous or excessive tool options. The practical limit is 15-30 tools before degradation becomes noticeable. Most MCP servers ship with 10-25 tools each. Three servers = you're already at the limit.

Most MCP Servers Are Bad

The honest truth: most MCP servers are thin wrappers around REST APIs. They convert each API endpoint into a separate tool, one-to-one. This is the worst possible design for an LLM:

50 endpoints → 50 tool definitions → massive context overhead
Tool descriptions are auto-generated from OpenAPI specs → not optimized for LLM comprehension
No thought given to which operations an agent actually needs

Good MCP server design requires agentic thinking: what tasks does the model need to accomplish? Which operations compose well? How can you expose fewer, higher-level tools that cover the common workflows?

Security Is Underspecified

The base MCP spec has no auth model. OAuth was added as an extension. The 2026 roadmap lists enterprise readiness (audit trails, SSO, gateway behavior) as a priority — meaning it doesn't exist yet.

Implications for your code:

MCP servers can execute arbitrary code
Prompt injection through tool results is a real risk (server returns data that contains instructions the model follows)
No standard way to restrict which tools a server can expose
No standard audit log of tool invocations

MCP vs API vs CLI

There's a hot debate in 2026 about whether MCP is even the right approach. Three alternatives exist:

Direct API Calls (via code execution)

The model writes code that calls REST APIs directly:

import requests
response = requests.get("https://api.github.com/repos/owner/repo/issues")
issues = response.json()

Pros: No schema overhead. Model already knows common APIs from training data. Full programmatic control (loops, conditionals, error handling in code).

Cons: No schema validation. No tool discovery. Security nightmare — the model can call anything. No standardized way to expose custom APIs.

CLI Tools (via bash)

The model runs command-line tools:

gh issue list --repo owner/repo --state open

Pros: Extremely token-efficient. Models know common CLIs (git, curl, gh, aws) from training data — zero schema overhead. Progressive discovery via --help. Structural safety baked into the binary (the CLI validates its own arguments).

Cons: Only works for tools the model already knows. Custom CLIs require the model to learn usage from --help output (costs tokens too, just deferred). No structured output guarantee. Error handling is primitive (exit codes + stderr). Useless for remote/SaaS integrations where no CLI exists.

Concrete comparison (Apideck benchmark):

Checking a repo's language: CLI = 1,365 tokens, MCP = 44,026 tokens
But on multi-step tasks (create invoice with line items): CLI = 19 LLM round trips, MCP = 12, Code Mode MCP = 4

MCP

Pros: Standardized discovery and execution. Works for both local and remote tools. Schema validation. Ecosystem compatibility. Telemetry and observability. Auth can be centralized.

Cons: Schema bloat. Extra infrastructure (server processes). Still maturing (auth, scaling, enterprise features).

When to Use What

Scenario	Best approach
Well-known CLI tools (`git`, `gh`, `docker`, `aws`)	CLI via bash
Custom internal APIs	MCP server (or CLI if token budget is tight)
SaaS integrations (GitHub, Slack, Jira)	MCP server
Data processing over many records	Code execution calling APIs directly
Tight token budget, simple operations	CLI
Team-wide standardized tooling	MCP (observability, auth, consistency)
Solo vibe-coding	CLI usually wins on efficiency

The right answer is usually a mix. The Playwright team explicitly says: use CLI+Skills for coding agents (token-efficient), use MCP for exploratory automation (persistent state, rich introspection).

MCP Servers That Matter for Development

Context7 — Library Documentation

Problem: LLM training data is stale. You ask about Next.js 15 and get Next.js 13 patterns.

Solution: Context7 fetches current, version-specific documentation from official sources and injects it into context.

Tools: resolve-library-id (find library), query-docs (fetch relevant docs)

Usage: Add use context7 to your prompt, or configure automatic invocation.

{
  "mcpServers": {
    "context7": {
      "command": "npx",
      "args": ["-y", "@upstash/context7-mcp"]
    }
  }
}

Why it's good: Only 2 tools (minimal schema overhead). Progressive — resolves library first, then fetches only relevant docs. Supports thousands of libraries. Works across all major coding agents. Currently the most widely adopted MCP server for coding.

Playwright — Headless Browser

Problem: Agents can't interact with web pages for testing, scraping, or verification.

Solution: Microsoft's Playwright MCP exposes browser automation via accessibility tree (not screenshots — no vision model needed).

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest"]
    }
  }
}

Why it matters: Deterministic element references (not pixel coordinates). Supports Chromium, Firefox, WebKit. Can run headed (you watch) or headless (CI).

The catch: 25+ tools = significant context overhead (~14K-20K tokens). Microsoft themselves now recommend CLI+Skills over MCP for coding agents where token efficiency matters. MCP is better for exploratory/interactive automation where persistent browser state is valuable.

Codebase Indexing — Structural Code Search

Problem: Agents explore codebases by grep and reading entire files. This is slow and token-expensive. "Brute-force reading" wastes context on irrelevant code.

Solutions (several competing approaches):

codebase-memory-mcp (DeusData) — Indexes code into a persistent knowledge graph using tree-sitter AST analysis. 66 languages. Queries return specific symbols, call chains, dependency graphs. Claims 83% answer quality at 10x fewer tokens than file exploration. Single static binary, zero dependencies.

jcodemunch-mcp — Tree-sitter based indexing with structural queries: find_importers, get_blast_radius, get_class_hierarchy, find_dead_code. These are questions grep literally cannot answer. Independent A/B test showed 15-25% token savings at the tool layer.

Why this category matters: The #1 token cost in coding agents is reading code. Any tool that lets the agent query for specific symbols instead of reading entire files saves significant tokens. This is the "better aim, not bigger context window" philosophy.

Language Server MCP — LSP Bridge

Problem: Agents don't have IDE-grade code intelligence. They can't do go-to-definition, find-references, or get diagnostics without reading and parsing code themselves.

Solution: mcp-language-server bridges existing LSP servers (gopls, rust-analyzer, pyright, typescript-language-server) to MCP. The agent gets actual compiler-grade intelligence.

{
  "mcpServers": {
    "language-server": {
      "command": "mcp-language-server",
      "args": [
        "--workspace", "/path/to/project",
        "--lsp", "typescript-language-server", "--", "--stdio"
      ]
    }
  }
}

Tools: get_definition, find_references, get_diagnostics, rename_symbol, get_hover

Why it matters: Instead of the agent pattern-matching on imports and guessing where functions are defined, it gets the same information your IDE has. This is especially valuable for statically typed languages where the LSP provides complete, accurate navigation.

The catch: Requires the language server to be installed. Setup is per-language. Adds a running process per workspace.

GitHub MCP

Official server for GitHub operations: issue management, PR workflows, code search, repository administration. Used by GitHub Copilot natively.

Postgres/SQLite MCP

Database access. Be very careful with write permissions. Always use read-only mode unless you have a specific, audited reason.

Strategies for Managing Context Bloat

1. Limit Active Tools

Don't enable every MCP server for every conversation. Context7 is useful when writing code against external libraries. Playwright is useful when testing. You rarely need both simultaneously.

In Claude Code: claude mcp add / claude mcp remove per project.

2. Tool Search (Anthropic)

Anthropic's tool_search is a server-side tool that replaces loading all tool schemas upfront. Instead:

Only tool_search is in the context (~100 tokens)
Model describes what it needs
tool_search returns only the relevant tool definitions
Model calls the discovered tools

This moves from O(n) schema cost to O(1) + O(k) where k << n.

3. Design Better Servers

Instead of exposing 50 endpoints:

Group related operations into meta-tools
One github_manage_issues tool with a action parameter instead of separate create_issue, update_issue, list_issues, close_issue
Return minimal, relevant data — not entire API responses

4. Programmatic Tool Calling (Code Mode)

Let the model write code that orchestrates MCP tools:

# Instead of 5 separate tool calls through conversation:
issues = mcp.github.list_issues(state="open")
for issue in issues:
    labels = mcp.github.get_labels(issue_id=issue["id"])
    if "bug" in labels:
        mcp.github.assign(issue_id=issue["id"], assignee="bot")

One inference pass, one code block, many tool calls. Anthropic introduced this pattern and reports 58% fewer tokens than raw MCP on multi-step tasks.

5. Use CLIs Where Possible

For tools the model already knows (git, gh, docker, npm, curl):

# 0 tokens of schema overhead — model knows git from training
git log --oneline -10

vs.

// ~800 tokens of schema + ~200 tokens of result framing
{"name": "git_log", "input": {"limit": 10, "format": "oneline"}}

Mix MCP for custom/SaaS tools with CLI for well-known tools.

Industry Trends (Early 2026)

The CLI Counter-Argument

A recurring argument: CLIs are better than MCP for coding agents. This is partially true — CLIs genuinely are more token-efficient for known tools. But CLIs don't solve:

Custom API integration without a pre-existing CLI
Team-wide observability (which tools are agents using? how often? what fails?)
Standardized auth and permissions
Remote tool execution (you can't npx a SaaS)

The pendulum will settle. MCP for standardized/remote/enterprise. CLI for well-known local tools. Both in the same agent.

The Spec Is Maturing

The 2026 MCP roadmap priorities:

Streamable HTTP improvements — horizontal scaling, stateless sessions, load balancer compatibility
Elicitation — servers can ask the user for input (not just the model)
Task lifecycle — retry semantics, expiry policies for long-running operations
Enterprise readiness — audit trails, SSO, gateway behavior (expected as extensions, not core spec changes)

Codebase Intelligence Is The Next Frontier

The explosion of code indexing MCP servers (codebase-memory, jcodemunch, Serena, CodeGraphContext, and dozens more) signals that the industry recognizes: agents reading files line-by-line is not going to scale. The future is pre-indexed, queryable code graphs that provide structural answers (call chains, dependency impacts, dead code) at a fraction of the token cost.

Agent Skills vs MCP

The Agent Skills specification (https://agentskills.io) exists alongside MCP. They solve different problems:

	MCP	Agent Skills
What it does	Connects to external tools and data	Teaches the agent how to perform tasks
Analogy	USB port	Employee handbook
Runtime	Server process, JSON-RPC calls	Instruction file, prompt injection
Token cost	High (schemas loaded per-call)	Low (~100 tokens discovery, ~5K activation)
State	Can maintain state (browser sessions, DB connections)	Stateless (instructions only)

You use both. MCP gives the agent capabilities. Skills tell the agent how to use them well.

Setting Up MCP in Claude Code

# Add a stdio server
claude mcp add context7 -- npx -y @upstash/context7-mcp

# Add a remote HTTP server
claude mcp add github --transport http https://api.github.com/mcp

# List active servers
claude mcp list

# Remove a server
claude mcp remove context7

# Check context usage
/context

Project-level config goes in .claude/settings.json:

{
  "mcpServers": {
    "context7": {
      "command": "npx",
      "args": ["-y", "@upstash/context7-mcp"]
    },
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest", "--headless"]
    }
  }
}

Writing a Simple MCP Server

You don't just consume MCP servers — you can write your own. A minimal stdio server in Python using the official SDK:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("student-lookup")

@mcp.tool()
def search_students(query: str, search_type: str) -> str:
    """Search the student database by name or student code.
    Use when the user asks about a specific student."""
    # Your actual database logic here
    return f"Found: {query} ({search_type})"

if __name__ == "__main__":
    mcp.run(transport="stdio")

Run with: pip install mcp[cli] then python server.py — or configure directly in Claude Code:

{
  "mcpServers": {
    "student-lookup": {
      "command": "python",
      "args": ["server.py"]
    }
  }
}

The SDK handles JSON-RPC protocol, tool schema generation (from type hints and docstrings), and transport. You write Python functions; the SDK exposes them as MCP tools.

For TypeScript, the pattern is similar with @modelcontextprotocol/sdk. See the MCP quickstart for both languages.

The key design decision: expose a few high-level tools that map to agent tasks, not one tool per database query. A search_students tool is better than separate query_by_name, query_by_code, query_by_course tools.

References

Academic

"Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP" (arXiv 2603.27277) — 83% quality at 10x fewer tokens vs file exploration
"RAG-MCP" (Writer): https://writer.com/engineering/rag-mcp/ — vector-based tool retrieval triples selection accuracy, halves prompt tokens

Broader Context

"Why the Model Context Protocol Does Not Work" (critical perspective): https://www.epicai.pro/why-the-model-context-protocol-does-not-work-hgsz5
Anthropic Advanced Tool Use (programmatic calling): https://www.anthropic.com/engineering/advanced-tool-use

What MCP Is​

What MCP Is NOT​

The Good​

Write Once, Run Everywhere​

Composability​

Progressive Discovery​

Ecosystem Momentum​

The Bad​

Context Window Bloat — The Real Problem​

Tool Selection Degrades With Scale​

Most MCP Servers Are Bad​

Security Is Underspecified​

MCP vs API vs CLI​

Direct API Calls (via code execution)​

CLI Tools (via bash)​

MCP​

When to Use What​

MCP Servers That Matter for Development​

Context7 — Library Documentation​

Playwright — Headless Browser​

Codebase Indexing — Structural Code Search​

Language Server MCP — LSP Bridge​

GitHub MCP​

Postgres/SQLite MCP​

Strategies for Managing Context Bloat​

1. Limit Active Tools​

2. Tool Search (Anthropic)​

3. Design Better Servers​

4. Programmatic Tool Calling (Code Mode)​

5. Use CLIs Where Possible​

Industry Trends (Early 2026)​

The CLI Counter-Argument​

The Spec Is Maturing​

Codebase Intelligence Is The Next Frontier​

Agent Skills vs MCP​

Setting Up MCP in Claude Code​

Writing a Simple MCP Server​

References​

Specification and Roadmap​

Context Bloat Problem​

Specific Servers​

Server Registries​

Academic​

Broader Context​