MAGIC Agent Skills is now open source! Star on GitHub

Evaluation Framework

The skill-judge rubric provides a consistent way to assess skill quality across all MAGIC suites.

skill-judge

skill-judge is a Claude Code skill that evaluates SKILL.md files against the universal skill anatomy and suite-specific best practices. It produces a scored report with P0–P3 severity ratings and actionable improvement suggestions.

Evaluation Dimensions

P0
Trigger coverage
Are the trigger keywords specific enough to avoid false positives and broad enough to catch real use cases?
P0
Acceptance criteria
Does each skill have verifiable, concrete acceptance criteria (not vague descriptions)?
P1
Description quality
Is the one-line description precise enough for an AI to correctly select this skill vs. alternatives?
P1
Example completeness
Do examples cover the golden path and at least one edge case?
P2
Parameter documentation
Are all parameters typed and described with valid ranges or constraints?
P2
Domain alignment
Does the skill fit within the domain metadata declared in frontmatter?
P3
Naming conventions
Does the skill name follow kebab-case and the suite naming pattern?

Running skill-judge

Invoke skill-judge from Claude Code or any compatible AI tool with the /skill-judge command:

/skill-judge path/to/SKILL.md

Or audit an entire suite directory:

/skill-judge skills/

Score Interpretation

  • P0 issues — block skill from being published
  • P1 issues — should be resolved before merging
  • P2 issues — should be addressed in follow-up PRs
  • P3 issues — nice-to-have improvements