Skip to main content

Assignment 2 - Skills

DEADLINE 2026-04-14 23:59:59

Objective

Build two production-quality agent skills following the Agent Skills specification: a PDF-to-Markdown converter and a thesis grader that uses it. Then evaluate AI-assisted thesis grading against your own human review.


Requirements

1. Skill: PDF-to-Markdown Converter

Build a skill that converts a multi-page PDF (including images) to a single markdown file using the Mistral OCR API.

Skill structure:

pdf-to-markdown/
├── SKILL.md
├── scripts/
│ └── convert.py # (or similar)
└── references/
└── ... # optional reference material
  • SKILL.md must have proper frontmatter (name, description at minimum)
  • The name field must match the directory name exactly
  • Store Mistral API key in an environment variable — never hardcode it
  • Handle multi-page PDFs with embedded images
  • Output: a single markdown file with extracted text and image descriptions
  • Scripts must follow the design principles from the lecture:
    • No interactive prompts — use CLI arguments
    • Implement --help
    • Structured output: data to stdout, diagnostics to stderr
    • Informative error messages (not just "Error: invalid input")
    • Inline dependencies (PEP 723 + uv run, or equivalent for your language)

2. Skill: Thesis Grader

Build a skill that takes a directory of thesis PDFs, converts them to markdown (using Skill 1), and runs a full grading and review on each thesis.

Skill structure:

thesis-grader/
├── SKILL.md
├── scripts/
│ └── ...
└── references/
├── grading-matrix.md
├── style-guide.md
└── ... # your university's grading materials
  • Collect grading materials from your university/institute — grading matrix, style guide, thesis writing instructions, defense evaluation criteria, etc. Convert to md and store them in references/
  • Tell the agent when to load each reference file (not just "see references/")
  • Include at least 3 theses for testing (publicly available theses)
  • Per-thesis output: structured review with feedback, recommendations, and questions for the author
  • Summary output: overview across all theses (score distribution, common issues, overall quality)
  • The skill should use the PDF-to-Markdown skill/scripts for the conversion step

3. Manual Evaluation & Comparison

  • Pick one new thesis from your field (not one of the 3 used for testing)
  • Write a manual human review using your university's grading criteria
  • Run the thesis grader skill on the same thesis
  • Compare the human review with the AI review
  • Write an Assignment Overview Report (report.md or similar) covering:
    • Differences between human and AI grading — where did they agree/disagree?
    • Where was AI stronger or weaker than human review?
    • Skill development process — what worked, what required iteration?
    • Gotchas discovered during development — were they added back into the skill's gotchas section?

Skill Format Requirements

Both skills must comply with the Agent Skills specification:

  • Frontmatter: name and description are required. compatibility and metadata are recommended.
  • Name field: lowercase, hyphens only, no leading/trailing hyphens, no consecutive hyphens. Must match the parent directory name.
  • Description field: must include clear trigger conditions — what the skill does, when to use it, keywords the user might say. A vague "Helps with PDFs" will not trigger reliably. Be aggressive — agents undertrigger (see lecture).
  • Scripts: must be self-contained with inline dependencies. Run with uv run (Python) or equivalent. No pip install prerequisites.
  • Reference files: explicitly referenced from SKILL.md with clear instructions on when the agent should load them.
  • Validation: run npx skills-ref validate ./my-skill on each skill before submitting.
  • Platform: skills must work in at least Claude Code or Kilo Code. Cross-agent compatibility is a bonus.

Deliverables

  1. Two working skills in proper Agent Skills directory format
  2. Reference materials for thesis grading collected in the thesis grader's references/
  3. At least 3 theses processed by the thesis grader, with generated reviews
  4. One thesis graded by both human and AI
  5. Assignment Overview Report comparing human vs AI review and reflecting on the development process
  6. Git repository with meaningful commit history reflecting incremental development

Defense

You will be asked to:

  • Walk through your SKILL.md frontmatter and explain each field's purpose
  • Explain why the description field is written the way it is — what triggers it?
  • Demonstrate running both skills on a new input (a PDF you haven't tested with)
  • Show how your scripts handle errors (corrupt PDF, missing API key, unsupported format)
  • Explain how progressive disclosure works in your skill — what loads at startup (~100 tokens), what loads on activation, what loads on-demand?
  • Modify a skill live (e.g., add a new grading criterion, change the output format)
  • Explain the difference between a skill and an MCP server — when would you use each?
  • Walk through your manual vs AI review comparison and discuss findings
  • Display all prompts used in coding

If you cannot explain it, you did not own it.


Git

All projects go here: gitlab.proxy.itcollege.ee

LLM Backend

For API access to LLM backend, use this: https://ai-proxy.cm.itcollege.ee/

Help with API: https://courses.taltech.akaver.com/agentic-software-development/lectures/ai-proxy