Assignment 2 - Skills
DEADLINE 2026-04-14 23:59:59
Objective
Build two production-quality agent skills following the Agent Skills specification: a PDF-to-Markdown converter and a thesis grader that uses it. Then evaluate AI-assisted thesis grading against your own human review.
Requirements
1. Skill: PDF-to-Markdown Converter
Build a skill that converts a multi-page PDF (including images) to a single markdown file using the Mistral OCR API.
Skill structure:
pdf-to-markdown/
├── SKILL.md
├── scripts/
│ └── convert.py # (or similar)
└── references/
└── ... # optional reference material
SKILL.mdmust have proper frontmatter (name,descriptionat minimum)- The
namefield must match the directory name exactly - Store Mistral API key in an environment variable — never hardcode it
- Handle multi-page PDFs with embedded images
- Output: a single markdown file with extracted text and image descriptions
- Scripts must follow the design principles from the lecture:
- No interactive prompts — use CLI arguments
- Implement
--help - Structured output: data to stdout, diagnostics to stderr
- Informative error messages (not just "Error: invalid input")
- Inline dependencies (PEP 723 +
uv run, or equivalent for your language)
2. Skill: Thesis Grader
Build a skill that takes a directory of thesis PDFs, converts them to markdown (using Skill 1), and runs a full grading and review on each thesis.
Skill structure:
thesis-grader/
├── SKILL.md
├── scripts/
│ └── ...
└── references/
├── grading-matrix.md
├── style-guide.md
└── ... # your university's grading materials
- Collect grading materials from your university/institute — grading matrix, style guide, thesis writing instructions, defense evaluation criteria, etc. Convert to md and store them in
references/ - Tell the agent when to load each reference file (not just "see references/")
- Include at least 3 theses for testing (publicly available theses)
- Per-thesis output: structured review with feedback, recommendations, and questions for the author
- Summary output: overview across all theses (score distribution, common issues, overall quality)
- The skill should use the PDF-to-Markdown skill/scripts for the conversion step
3. Manual Evaluation & Comparison
- Pick one new thesis from your field (not one of the 3 used for testing)
- Write a manual human review using your university's grading criteria
- Run the thesis grader skill on the same thesis
- Compare the human review with the AI review
- Write an Assignment Overview Report (
report.mdor similar) covering:- Differences between human and AI grading — where did they agree/disagree?
- Where was AI stronger or weaker than human review?
- Skill development process — what worked, what required iteration?
- Gotchas discovered during development — were they added back into the skill's gotchas section?
Skill Format Requirements
Both skills must comply with the Agent Skills specification:
- Frontmatter:
nameanddescriptionare required.compatibilityandmetadataare recommended. - Name field: lowercase, hyphens only, no leading/trailing hyphens, no consecutive hyphens. Must match the parent directory name.
- Description field: must include clear trigger conditions — what the skill does, when to use it, keywords the user might say. A vague "Helps with PDFs" will not trigger reliably. Be aggressive — agents undertrigger (see lecture).
- Scripts: must be self-contained with inline dependencies. Run with
uv run(Python) or equivalent. Nopip installprerequisites. - Reference files: explicitly referenced from SKILL.md with clear instructions on when the agent should load them.
- Validation: run
npx skills-ref validate ./my-skillon each skill before submitting. - Platform: skills must work in at least Claude Code or Kilo Code. Cross-agent compatibility is a bonus.
Deliverables
- Two working skills in proper Agent Skills directory format
- Reference materials for thesis grading collected in the thesis grader's
references/ - At least 3 theses processed by the thesis grader, with generated reviews
- One thesis graded by both human and AI
- Assignment Overview Report comparing human vs AI review and reflecting on the development process
- Git repository with meaningful commit history reflecting incremental development
Defense
You will be asked to:
- Walk through your
SKILL.mdfrontmatter and explain each field's purpose - Explain why the
descriptionfield is written the way it is — what triggers it? - Demonstrate running both skills on a new input (a PDF you haven't tested with)
- Show how your scripts handle errors (corrupt PDF, missing API key, unsupported format)
- Explain how progressive disclosure works in your skill — what loads at startup (~100 tokens), what loads on activation, what loads on-demand?
- Modify a skill live (e.g., add a new grading criterion, change the output format)
- Explain the difference between a skill and an MCP server — when would you use each?
- Walk through your manual vs AI review comparison and discuss findings
- Display all prompts used in coding
If you cannot explain it, you did not own it.
Git
All projects go here: gitlab.proxy.itcollege.ee
LLM Backend
For API access to LLM backend, use this: https://ai-proxy.cm.itcollege.ee/
Help with API: https://courses.taltech.akaver.com/agentic-software-development/lectures/ai-proxy