24 - Spec Kit

After exploring OpenSpec's lightweight, change-centric approach for existing codebases, we now look at Spec Kit — GitHub's more structured toolkit designed primarily for greenfield projects. Where OpenSpec gives you a quick propose-apply cycle, Spec Kit adds a full pipeline: constitution, specification, planning, task generation, and implementation.

Spec Kit GitHub

Quick Start

Installation

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git

Upgrade

uv tool install specify-cli --force --from git+https://github.com/github/spec-kit.git

Usage

# Create new project
specify init <PROJECT_NAME>

# Or initialize in existing project
specify init . --ai claude
# or
specify init --here --ai claude

# Check installed tools and dependencies
specify check

--ai <agent_name> — supports 20+ agents: claude, copilot, kilocode, cursor, windsurf, gemini, and others.

Core Principles

Specifications as the Lingua Franca: The specification becomes the primary artifact. Code becomes its expression in a particular language and framework. Maintaining software means evolving specifications.

Executable Specifications: Specifications must be precise, complete, and unambiguous enough to generate working systems. This eliminates the gap between intent and implementation.

Continuous Refinement: Consistency validation happens continuously, not as a one-time gate. AI analyzes specifications for ambiguity, contradictions, and gaps as an ongoing process.

Research-Driven Context: Research agents gather critical context throughout the specification process, investigating technical options, performance implications, and organizational constraints.

Bidirectional Feedback: Production reality informs specification evolution. Metrics, incidents, and operational learnings become inputs for specification refinement.

Branching for Exploration: Generate multiple implementation approaches from the same specification to explore different optimization targets — performance, maintainability, user experience, cost.

The Pipeline

The Spec Kit workflow follows a linear pipeline with optional quality checkpoints:

constitution → specify → [clarify] → plan → [analyze] → tasks → [analyze] → implement
                            ↑                    ↑                    ↑
                        (optional)           (optional)           (optional)

Note: commands are slash commands in your IDE (e.g., /speckit.specify in Copilot Chat, Claude Code, Kilo Code). The CLI tool (specify init, specify check) is only used for project setup.

Essential Commands

Command	Description
`/speckit.constitution`	Create or update project governing principles and development guidelines
`/speckit.specify`	Define what you want to build (requirements and user stories)
`/speckit.plan`	Create technical implementation plans with your chosen tech stack
`/speckit.tasks`	Generate actionable task lists for implementation
`/speckit.implement`	Execute all tasks to build the feature according to the plan

Quality Gate Commands (Optional)

Command	Description
`/speckit.clarify`	Interactive Q&A to surface underspecified areas. Recommended before `/speckit.plan` to catch ambiguity early.
`/speckit.analyze`	Cross-artifact consistency and coverage analysis. Checks that specs, plans, and tasks are aligned. Run after `/speckit.tasks`, before `/speckit.implement`.
`/speckit.checklist`	Generate custom quality checklists that validate requirements completeness, clarity, and consistency — like "unit tests for English."

These optional commands act as quality gates at each stage. /speckit.clarify catches vague requirements before they become vague plans. /speckit.analyze catches inconsistencies between your spec, plan, and tasks before implementation begins. /speckit.checklist creates human-reviewable validation criteria.

Command Details

`/speckit.constitution`

The constitution establishes the governing principles for your entire project. It defines non-negotiable quality standards, testing requirements, UX patterns, and performance targets that every feature must comply with. Think of it as a project-wide contract that every specification and plan is checked against.

The constitution is stored at .specify/memory/constitution.md and is referenced by /speckit.plan during its "Constitution Check" gate.

Abbreviated example (from the greenfield demo):

# Project Constitution

## Core Principles

### I. Code Quality
All code MUST meet these non-negotiable quality standards:
- **Readability**: Code MUST be self-documenting with clear naming conventions.
- **Maintainability**: Functions MUST have a single responsibility.
  Files MUST NOT exceed 300 lines without justification.
- **Consistency**: All code MUST follow the project's style guide.
  Linting MUST pass with zero warnings before merge.
- **No Dead Code**: Unused imports, variables, and functions MUST be removed.

### II. Testing Standards
- **Coverage Threshold**: New code MUST have minimum 80% line coverage.
  Critical paths MUST have 100% coverage.
- **Test Types Required**: Unit tests, integration tests, contract tests.
- **CI Gate**: All tests MUST pass before merge.

### III. User Experience Consistency
- **Error Handling**: All errors MUST provide clear, actionable messages.
- **Predictability**: Operations MUST produce consistent results given same inputs.

### IV. Performance Requirements
- **Response Time**: CLI commands MUST respond within 500ms.
- **Startup Time**: Application startup MUST complete within 200ms.

## Quality Gates
| Gate          | Requirement                    | Enforcement        |
|---------------|--------------------------------|--------------------|
| Linting       | Zero errors, zero warnings     | CI automated check |
| Test Coverage | Minimum 80% on new code        | CI automated check |
| Test Pass     | 100% tests passing             | CI automated check |
| Documentation | Public APIs documented         | Code review        |

## Governance
This constitution is the authoritative guide for all technical decisions.
When technical decisions conflict:
1. This constitution takes precedence over personal preference
2. User experience takes precedence over implementation convenience
3. Simplicity takes precedence over premature optimization

`/speckit.specify`

This command transforms a simple feature description into a complete, structured specification with automatic repository management:

Automatic Feature Numbering: Scans existing specs to determine the next feature number (e.g., 001, 002, 003)
Branch Creation: Generates a semantic branch name from your description and creates it automatically
Template-Based Generation: Copies and customizes the feature specification template with your requirements
Directory Structure: Creates the proper specs/[branch-name]/ structure for all related documents

Abbreviated spec output (from the greenfield demo — timezone utility):

# Feature Specification: Timezone Meeting Utility

**Feature Branch**: `001-timezone-utility`
**Status**: Draft

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Get Current Time for Location (Priority: P1)
As a user working with international colleagues, I want to quickly check
the current date and time in any location using whatever identifier I have
available (timezone, city name, country, or US zip code).

**Acceptance Scenarios**:
1. **Given** I request time for timezone "America/New_York",
   **Then** I see the current date and time with timezone abbreviation
2. **Given** I request time for US zip code "94102",
   **Then** I see the current date and time for San Francisco's timezone
3. **Given** I request time for an invalid location,
   **Then** I see a helpful error message suggesting how to format my input

### Edge Cases
- City name matches multiple locations → display all matches, prompt user
- Country has multiple timezones → list all applicable timezones
- DST transitions → use current DST rules automatically

## Requirements *(mandatory)*
- **FR-001**: System MUST accept IANA timezone identifiers
- **FR-002**: System MUST accept city names and resolve to timezones
- **FR-003**: System MUST accept US zip codes (5-digit)
- **FR-005**: System MUST handle ambiguous queries by presenting options

## Success Criteria *(mandatory)*
- Users can look up current time for any location in under 3 seconds
- 95% of common city names resolve correctly without additional user input
- All error messages provide actionable guidance

Note the structure: user stories with acceptance scenarios (Given/When/Then), functional requirements with RFC 2119 keywords (MUST, SHOULD, MAY), measurable success criteria, and explicit edge cases. Compare this with OpenSpec's spec.md — similar structure, but Spec Kit generates more detailed user stories and explicit success criteria.

`/speckit.plan`

Once a feature specification exists, this command creates a comprehensive implementation plan:

Specification Analysis: Reads and understands the feature requirements, user stories, and acceptance criteria
Constitutional Compliance: Runs a "Constitution Check" gate — ensures alignment with project constitution principles
Technical Translation: Converts business requirements into technical architecture and implementation details
Detailed Documentation: Generates supporting documents for data models, API contracts, and test scenarios
Quickstart Validation: Produces a quickstart guide capturing key validation scenarios

The plan includes a Constitution Check table that verifies each constitutional principle is met:

| Principle            | Gate Status | Notes                                                    |
|----------------------|-------------|----------------------------------------------------------|
| I. Code Quality      | PASS        | Single responsibility functions, <300 LOC files          |
| II. Testing Standards| PASS        | xUnit + Coverlet for 80%+ coverage, unit/integration     |
| III. UX Consistency  | PASS        | Consistent CLI patterns, clear errors, --json flag       |
| IV. Performance      | PASS        | <500ms response, <200ms startup via ReadyToRun           |

`/speckit.tasks`

After a plan is created, this command analyzes the plan and related design documents to generate an executable task list:

Inputs: Reads plan.md (required) and, if present, data-model.md, contracts/, and research.md
Task Derivation: Converts contracts, entities, and scenarios into specific tasks with exact file paths
Parallelization: Marks independent tasks [P] and outlines safe parallel groups
Output: Writes tasks.md in the feature directory, ready for execution

Abbreviated tasks output:

## Phase 1: Setup (Shared Infrastructure)
- [ ] T001 Create solution and project structure
- [ ] T002 Configure .csproj with NodaTime, System.CommandLine
- [ ] T003 [P] Create test projects
- [ ] T004 [P] Configure Directory.Build.props

## Phase 2: Foundational (Blocking Prerequisites)
CRITICAL: No user story work can begin until this phase is complete
- [ ] T006 Create Program.cs with root command structure
- [ ] T007 [P] Create Location record model
- [ ] T008 [P] Create TimeSlot record model

## Phase 3: User Story 1 (Priority: P1) - MVP
- [ ] T020 [US1] Implement IanaTimezoneResolver
- [ ] T021 [US1] [P] Implement CityResolver
- [ ] T022 [US1] [P] Implement ZipCodeResolver
- [ ] T024 [US1] Implement CompositeLocationResolver (depends on T020-T023)

## Dependencies & Execution Order
- Phase 2 BLOCKS all user stories
- [P] tasks can run in parallel (different files)
- [US#] maps task to specific user story

Note the structure: phased with dependency ordering, parallel markers [P], user story mapping [US#], and critical path identification. This is significantly more structured than OpenSpec's tasks.md.

`/speckit.implement`

This command reads tasks.md and executes the implementation:

Task Parsing: Reads the task list and understands dependencies and parallelization markers
Sequential Execution: Works through tasks in order, respecting phase dependencies
Code Generation: Writes actual source code files based on the plan's project structure and the spec's requirements
Checkpoint Validation: Pauses at phase boundaries for validation before proceeding
Progress Tracking: Marks tasks as completed in tasks.md as it goes

This is the step where specifications become code. The AI agent reads the constitution, spec, plan, and tasks — then generates implementation files. The quality of this output depends directly on the precision of the preceding artifacts.

Directory Structure

After running the pipeline, your project looks like this:

project-root/
├── .specify/
│   ├── memory/
│   │   └── constitution.md          # Project-wide principles
│   └── templates/
│       ├── spec-template.md
│       ├── plan-template.md
│       └── tasks-template.md
├── specs/
│   └── 001-timezone-utility/        # Feature directory (per branch)
│       ├── spec.md                  # Feature specification
│       ├── plan.md                  # Implementation plan
│       ├── research.md              # Technical research notes
│       ├── data-model.md            # Entity/data definitions
│       ├── contracts/               # API contracts
│       ├── quickstart.md            # Validation guide
│       └── tasks.md                 # Executable task list
├── src/                             # Generated source code
└── tests/                           # Generated tests

Each feature gets its own numbered directory under specs/. The .specify/ directory holds the constitution and templates that govern all features.

Demos

Greenfield — a .NET CLI timezone utility built end-to-end from spec to working code using all Spec Kit commands: https://github.com/mnriem/spec-kit-dotnet-cli-demo

Brownfield — adding features to an existing ASP.NET application, demonstrating that Spec Kit can work with existing codebases (though with more friction than OpenSpec): https://github.com/mnriem/spec-kit-aspnet-brownfield-demo

Spec Kit vs OpenSpec

Aspect	OpenSpec	Spec Kit
Primary use case	Existing codebases (brownfield)	New projects (greenfield)
Setup	`npm install` + `openspec init`	`uv tool install` + `specify init`
Governing document	`config.yaml` (tech stack + rules)	`constitution.md` (principles + quality gates)
Workflow	propose → apply → archive	constitution → specify → plan → tasks → implement
Artifact depth	proposal, spec, design, tasks	spec, plan, data-model, contracts, tasks
Quality gates	Manual review	`/speckit.clarify`, `/speckit.analyze`, `/speckit.checklist`
Feature scoping	One logical change per proposal	Full features with user stories and acceptance criteria
Task structure	Simple checklist	Phased with dependencies, parallelization markers, user story mapping
Overhead	Low (minutes per change)	Medium (30min–3hrs per feature including review)
AI agent support	Via slash commands + context rules	20+ agents via standardized slash commands

OpenSpec is faster for incremental changes on existing code. Spec Kit produces richer artifacts and is better suited for features where the upfront planning investment pays off through reduced rework.

Toward Spec-as-Source

Recall from lecture 22 the three implementation levels:

Spec-first: Write spec, generate code, delete spec. Code is source of truth.
Spec-anchored: Keep spec alongside code for ongoing evolution.
Spec-as-source: Spec is the only thing humans edit. Code is regenerated.

Spec Kit is designed for spec-anchored development — the specs live in specs/ and evolve alongside the code. Can it reach spec-as-source?

What works today: The greenfield demo shows that you can run the full pipeline (specify → plan → tasks → implement) and get a working application. The specs are detailed enough to produce functionally equivalent code on regeneration.

Where it breaks down:

Non-determinism: LLMs produce different code each run. Variable names, library choices, and implementation patterns will vary. The result is functionally equivalent, not identical.
The boring plumbing: Database migrations, external service configs, deployment manifests, CI/CD pipelines — these are rarely captured in specs and don't regenerate cleanly.
Spec drift: Static specs don't update during implementation. If the AI makes adjustments during /speckit.implement, the spec no longer matches reality.
Accumulated state: After multiple features, cross-feature interactions emerge that individual specs don't capture.

The gap to spec-as-source: For Spec Kit to fully support spec-as-source, it would need bidirectional sync (code changes update specs), deterministic-enough generation (or diffing tools), and infrastructure-as-spec (capturing the plumbing). This is an active area of exploration — see the IaC Spec Kit fork below.

Best Practices

Start with the constitution. A good constitution prevents the AI from making architectural decisions that contradict your project's standards. Invest time here — it pays off across every feature.

Use /speckit.clarify before planning. Ambiguous specs produce ambiguous plans. Let the AI ask you questions before it starts designing.

One feature per spec. Like OpenSpec's "one logical change per proposal" — keep features focused. Cross-cutting concerns should be separate specs.

Review generated artifacts at each stage. Don't run the full pipeline blindly. Review the spec before planning, the plan before tasking, the tasks before implementing. Each stage is a human checkpoint.

Run /speckit.analyze before implementing. This catches inconsistencies between your spec, plan, and tasks that are hard to spot manually.

Strengths and Limitations

Strengths:

Rich, structured artifacts that serve as documentation
Constitution provides project-wide governance
Built-in quality gate commands (clarify, analyze, checklist)
Supports 20+ AI coding agents through standardized commands
Phased task generation with dependency and parallelization awareness
Good fit for medium-to-large greenfield features

Limitations:

Static specs do not update during implementation, creating drift on longer tasks
Substantial time overhead (often 1–3+ hours per feature once review is included)
More friction with legacy frameworks and complex existing codebases (OpenSpec is better here)
Single-repo focus with no cross-repository awareness
The pipeline is sequential — can feel heavyweight for small changes

Research Connections

For students interested in the theoretical foundations:

Program Synthesis: Spec Kit's pipeline (natural language intent → structured spec → code) mirrors the program synthesis problem — generating programs from specifications. The difference: classical synthesis targets formal specs and correctness proofs; Spec Kit uses natural language specs and statistical generation.
Design by Contract: The constitution and spec requirements (MUST, SHOULD, MAY) echo Bertrand Meyer's Design by Contract — preconditions, postconditions, and invariants. The constitution is a project-wide invariant; spec requirements are feature-level contracts.
Model-Driven Engineering (MDE): The spec-plan-tasks-code pipeline resembles MDE's model→transformation→code flow. The spec is the model; the AI agent is the transformation engine. Unlike traditional MDE, the "model" is natural language rather than UML/EMF.
Literate Programming: Donald Knuth's vision of programs as literature for humans, with code extracted mechanically. Spec-as-source inverts this: the specification is the human-readable primary artifact, and code is the mechanically-extracted derivative.

Forks

IaC Spec Kit — a cloud-agnostic infrastructure as code specification toolkit. Extends the spec-driven approach to infrastructure, addressing the "boring plumbing" gap where application specs typically underspecify deployment and infrastructure concerns.

https://github.com/IBM/iac-spec-kit

Quick Start​

Core Principles​

The Pipeline​

Essential Commands​

Quality Gate Commands (Optional)​

Command Details​

/speckit.constitution​

/speckit.specify​

/speckit.plan​

/speckit.tasks​

/speckit.implement​

Directory Structure​

Demos​

Spec Kit vs OpenSpec​

Toward Spec-as-Source​

Best Practices​

Strengths and Limitations​

Research Connections​

Forks​