Skill Anatomy
Every skill in MAGIC is a single structured file: SKILL.md. This document is read by your AI assistant to understand a data domain. Understanding its structure helps you know what the agent knows — and how to extend or customise skills for your own projects.
YAML Frontmatter
Each SKILL.md opens with a YAML block that declares metadata the CLI installer and agent discovery system use:
---
name: magic-data-cleaning
description: Missing value imputation, normalization, and deduplication for tabular data
version: 1.2.0
tags: [cleaning, imputation, normalization, deduplication, quality]
scripts:
- scripts/impute_nulls.py
- scripts/normalize_columns.py
- scripts/deduplicate.py
dependencies:
python: ["pandas>=2.0", "scikit-learn>=1.3", "numpy"]
metadata:
phase: quality
complexity: medium
quality_gate_output: 85
---Frontmatter Fields
| Field | Purpose |
|---|---|
name | Unique skill identifier used for routing and cross-references |
description | One-sentence summary — what the agent uses for trigger matching |
version | Semver string — lets the installer detect outdated skill files |
tags | Keywords for fuzzy trigger matching when the description alone is ambiguous |
scripts | Paths to reference scripts bundled with the skill (relative to skill root) |
dependencies | Python packages the agent should verify are available before running |
metadata.phase | Pipeline phase: setup, ingest, quality, analysis, processing, output, orchestration |
metadata.complexity | Rough complexity: low, medium, high — used by the lifecycle skill for scheduling |
metadata.quality_gate_output | Minimum quality score this skill should produce (agent self-checks against this) |
The 3-Layer Content Structure
After the frontmatter, a SKILL.md is organised into three conceptual layers. These are not literally labelled as "layers" in the file — they emerge from the section structure.
Layer 1 — Domain Knowledge
Sections: When to Use, Constraints, Anti-Patterns
This layer tells the agent what the domain is about and when to activate. It answers:
- What user requests should trigger this skill?
- What rules must always be followed (e.g. "never drop more than 5% of rows without user confirmation")?
- What mistakes does the agent commonly make that must be avoided?
Domain knowledge shapes the agent's reasoning before any code is written. It is the most important layer for getting correct behaviour.
Example constraint from magic-data-cleaning:
"Never impute a column where more than 40% of values are null without surfacing this to the user first. High-null columns may indicate a structural data problem rather than missing values."
Layer 2 — Code Patterns
Sections: Reference Scripts, Seed Patterns
This layer provides concrete implementations the agent reads as starting points. Scripts are reference implementations — the agent reads them, understands the approach, and writes adapted code for your specific data.
Scripts are never executed directly by MAGIC. The agent reads them for pattern guidance and writes new code tailored to your data's actual column names, types, and shape. This is a critical distinction.
A reference script for null imputation might look like:
# Reference: impute_nulls.py
# The agent reads this and adapts it to your actual DataFrame columns.
import pandas as pd
from sklearn.impute import SimpleImputer
def impute_numeric_columns(df: pd.DataFrame, strategy: str = "median") -> pd.DataFrame:
numeric_cols = df.select_dtypes(include="number").columns.tolist()
imputer = SimpleImputer(strategy=strategy)
df[numeric_cols] = imputer.fit_transform(df[numeric_cols])
return dfWhen the agent sees your data has columns revenue, units, and discount_pct, it writes a version of this function with those column names, appropriate strategies for each column's distribution, and any special cases your data requires.
Layer 3 — Procedures
Sections: Workflow, Checkpointing, Self-Healing, Handoff
This layer tells the agent how to do the work step by step. A procedure section looks like:
- Load the most recent checkpoint from
workspace/data/checkpoints/ - Run
magic-data-profilingif no profiling checkpoint exists - Identify null columns above the threshold
- Select imputation strategy per column type (median for numeric, mode for categorical)
- Apply imputation and record row-level changes
- Write checkpoint as
ckpt_\{NN\}_nulls_imputed.\{ext\} - Update
workspace_state.mdwith the new checkpoint path - Surface a summary: columns imputed, strategy used, rows affected
Procedures make the agent's behaviour predictable and auditable — you can read the SKILL.md and know exactly what the agent will do.
Content Sections Reference
| Section | Layer | Purpose |
|---|---|---|
## When to Use | Domain Knowledge | Trigger phrases and applicability criteria |
## Constraints | Domain Knowledge | Hard rules the agent must not violate |
## Anti-Patterns | Domain Knowledge | Common mistakes and why to avoid them |
## Workflow | Procedures | Ordered steps the agent follows |
## Scripts | Code Patterns | Table of reference scripts with descriptions |
## Self-Healing | Procedures | Recovery instructions for common failures |
## Handoff | Procedures | What to tell the next skill in the pipeline |
## Reference Guides | Domain Knowledge | External references (library docs, standards) |
How the Agent Uses a Skill File
Reading a SKILL.md is not like executing a configuration file. The agent treats it as a briefing document:
- Skim the frontmatter — confirm this skill applies to the current task
- Read "When to Use" — verify the trigger conditions match
- Read "Constraints" — load hard rules into working context before generating any code
- Read the Workflow — build a mental model of the steps
- Read relevant Scripts — understand the code patterns for the operations this task requires
- Write adapted code — produce implementation tailored to the user's actual data
- Apply Self-Healing — if an error occurs, check self-healing instructions before asking the user
The agent may not read every section of a large SKILL.md for simple tasks — it focuses on the sections most relevant to your specific request.
Extending Skills
You can customise any installed SKILL.md to encode project-specific knowledge:
- Add project column names to the "Code Patterns" section so the agent doesn't have to infer them
- Tighten constraints (e.g. raise the null-drop threshold for your domain)
- Add reference scripts specific to your data format
- Update the "When to Use" triggers to include your internal terminology
Custom changes to SKILL.md files are not overwritten by magic update unless you use the --force flag. Your additions are preserved across updates.
Last updated on