Evaluation Framework

The skill-judge rubric provides a consistent way to assess skill quality across all MAGIC suites.

skill-judge

skill-judge is a Claude Code skill that evaluates SKILL.md files against the universal skill anatomy and suite-specific best practices. It produces a scored report with P0–P3 severity ratings and actionable improvement suggestions.

Evaluation Dimensions

Trigger coverage

Are the trigger keywords specific enough to avoid false positives and broad enough to catch real use cases?

Acceptance criteria

Does each skill have verifiable, concrete acceptance criteria (not vague descriptions)?

Description quality

Is the one-line description precise enough for an AI to correctly select this skill vs. alternatives?

Example completeness

Do examples cover the golden path and at least one edge case?

Parameter documentation

Are all parameters typed and described with valid ranges or constraints?

Domain alignment

Does the skill fit within the domain metadata declared in frontmatter?

Naming conventions

Does the skill name follow kebab-case and the suite naming pattern?

Running skill-judge

Invoke skill-judge from Claude Code or any compatible AI tool with the /skill-judge command:

/skill-judge path/to/SKILL.md

Or audit an entire suite directory:

/skill-judge skills/

Score Interpretation

P0 issues — block skill from being published
P1 issues — should be resolved before merging
P2 issues — should be addressed in follow-up PRs
P3 issues — nice-to-have improvements