Skill Anatomy

Every skill in MAGIC is a single structured file: SKILL.md. This document is read by your AI assistant to understand a data domain. Understanding its structure helps you know what the agent knows — and how to extend or customise skills for your own projects.

YAML Frontmatter

Each SKILL.md opens with a YAML block that declares metadata the CLI installer and agent discovery system use:

---
name: magic-data-cleaning
description: Missing value imputation, normalization, and deduplication for tabular data
version: 1.2.0
tags: [cleaning, imputation, normalization, deduplication, quality]
scripts:
  - scripts/impute_nulls.py
  - scripts/normalize_columns.py
  - scripts/deduplicate.py
dependencies:
  python: ["pandas>=2.0", "scikit-learn>=1.3", "numpy"]
metadata:
  phase: quality
  complexity: medium
  quality_gate_output: 85
---

Frontmatter Fields

Field	Purpose
`name`	Unique skill identifier used for routing and cross-references
`description`	One-sentence summary — what the agent uses for trigger matching
`version`	Semver string — lets the installer detect outdated skill files
`tags`	Keywords for fuzzy trigger matching when the description alone is ambiguous
`scripts`	Paths to reference scripts bundled with the skill (relative to skill root)
`dependencies`	Python packages the agent should verify are available before running
`metadata.phase`	Pipeline phase: `setup`, `ingest`, `quality`, `analysis`, `processing`, `output`, `orchestration`
`metadata.complexity`	Rough complexity: `low`, `medium`, `high` — used by the lifecycle skill for scheduling
`metadata.quality_gate_output`	Minimum quality score this skill should produce (agent self-checks against this)

The 3-Layer Content Structure

After the frontmatter, a SKILL.md is organised into three conceptual layers. These are not literally labelled as "layers" in the file — they emerge from the section structure.

Layer 1 — Domain Knowledge

Sections: When to Use, Constraints, Anti-Patterns

This layer tells the agent what the domain is about and when to activate. It answers:

What user requests should trigger this skill?
What rules must always be followed (e.g. "never drop more than 5% of rows without user confirmation")?
What mistakes does the agent commonly make that must be avoided?

Domain knowledge shapes the agent's reasoning before any code is written. It is the most important layer for getting correct behaviour.

Example constraint from magic-data-cleaning:

"Never impute a column where more than 40% of values are null without surfacing this to the user first. High-null columns may indicate a structural data problem rather than missing values."

Layer 2 — Code Patterns

Sections: Reference Scripts, Seed Patterns

This layer provides concrete implementations the agent reads as starting points. Scripts are reference implementations — the agent reads them, understands the approach, and writes adapted code for your specific data.

Scripts are never executed directly by MAGIC. The agent reads them for pattern guidance and writes new code tailored to your data's actual column names, types, and shape. This is a critical distinction.

A reference script for null imputation might look like:

# Reference: impute_nulls.py
# The agent reads this and adapts it to your actual DataFrame columns.

import pandas as pd
from sklearn.impute import SimpleImputer

def impute_numeric_columns(df: pd.DataFrame, strategy: str = "median") -> pd.DataFrame:
    numeric_cols = df.select_dtypes(include="number").columns.tolist()
    imputer = SimpleImputer(strategy=strategy)
    df[numeric_cols] = imputer.fit_transform(df[numeric_cols])
    return df

When the agent sees your data has columns revenue, units, and discount_pct, it writes a version of this function with those column names, appropriate strategies for each column's distribution, and any special cases your data requires.

Layer 3 — Procedures

Sections: Workflow, Checkpointing, Self-Healing, Handoff

This layer tells the agent how to do the work step by step. A procedure section looks like:

Load the most recent checkpoint from workspace/data/checkpoints/
Run magic-data-profiling if no profiling checkpoint exists
Identify null columns above the threshold
Select imputation strategy per column type (median for numeric, mode for categorical)
Apply imputation and record row-level changes
Write checkpoint as ckpt_\{NN\}_nulls_imputed.\{ext\}
Update workspace_state.md with the new checkpoint path
Surface a summary: columns imputed, strategy used, rows affected

Procedures make the agent's behaviour predictable and auditable — you can read the SKILL.md and know exactly what the agent will do.

Content Sections Reference

Section	Layer	Purpose
`## When to Use`	Domain Knowledge	Trigger phrases and applicability criteria
`## Constraints`	Domain Knowledge	Hard rules the agent must not violate
`## Anti-Patterns`	Domain Knowledge	Common mistakes and why to avoid them
`## Workflow`	Procedures	Ordered steps the agent follows
`## Scripts`	Code Patterns	Table of reference scripts with descriptions
`## Self-Healing`	Procedures	Recovery instructions for common failures
`## Handoff`	Procedures	What to tell the next skill in the pipeline
`## Reference Guides`	Domain Knowledge	External references (library docs, standards)

How the Agent Uses a Skill File

Reading a SKILL.md is not like executing a configuration file. The agent treats it as a briefing document:

Skim the frontmatter — confirm this skill applies to the current task
Read "When to Use" — verify the trigger conditions match
Read "Constraints" — load hard rules into working context before generating any code
Read the Workflow — build a mental model of the steps
Read relevant Scripts — understand the code patterns for the operations this task requires
Write adapted code — produce implementation tailored to the user's actual data
Apply Self-Healing — if an error occurs, check self-healing instructions before asking the user

The agent may not read every section of a large SKILL.md for simple tasks — it focuses on the sections most relevant to your specific request.

Extending Skills

You can customise any installed SKILL.md to encode project-specific knowledge:

Add project column names to the "Code Patterns" section so the agent doesn't have to infer them
Tighten constraints (e.g. raise the null-drop threshold for your domain)
Add reference scripts specific to your data format
Update the "When to Use" triggers to include your internal terminology

Custom changes to SKILL.md files are not overwritten by magic update unless you use the --force flag. Your additions are preserved across updates.

Was this page helpful?

On this page