Three-Layer Skill Model

Each MAGIC skill is a knowledge package organized into three layers. Understanding this model explains why skills work the way they do and how to get the most from them.

┌─────────────────────────────────────────────┐
│  Layer 1: SKILL.md                          │
│  Domain knowledge, procedures, rules        │
│  Thinking patterns, constraints, routing    │
├─────────────────────────────────────────────┤
│  Layer 2: references/*.md                   │
│  Detailed reference docs (loaded on demand) │
├─────────────────────────────────────────────┤
│  Layer 3: scripts/*.py                      │
│  Reference implementations                 │
│  Agent reads → writes adapted custom code  │
└─────────────────────────────────────────────┘

Layer 1: SKILL.md

SKILL.md is the primary file the agent reads. It contains:

When to use — trigger phrases and activation conditions
Thinking patterns — questions the agent should ask before acting
Rules — what the agent must and must not do
Constraints — hard limits (e.g., never load >500MB without chunking)
Seed patterns — code skeletons the agent adapts to your specific data
Script documentation — what each script does and how to call it
Self-healing — how to recover from common failures
Routing — when to hand off to a different skill

The agent reads SKILL.md at the start of any task that matches the skill's domain. The document is structured so the most critical information (When to Use, Thinking, Rules) appears before the supporting detail.

Layer 2: references/*.md

Each skill has a references/ directory with detailed guides loaded on demand. These documents cover topics too detailed for the main SKILL.md — advanced loading patterns, domain-specific validation rules, statistical interpretation guides, style guidelines.

SKILL.md explicitly tells the agent which reference file to load and when:

## Reference Guides

| Topic | File | Load When |
|-------|------|-----------|
| Large files | references/large_file_patterns.md | File >100MB or MemoryError |
| Format detection | references/format_detection.md | Unusual format or detection fails |

Reference files are not pre-loaded — the agent loads them only when needed. This keeps context efficient: a simple CSV load doesn't need large-file streaming guidance.

Layer 3: scripts/*.py

Scripts are reference implementations, not executables. The agent reads them to understand the approach, then writes its own code adapted to your specific task.

This is the most important and most misunderstood aspect of the model:

Scripts provide patterns. The agent may follow them closely, adapt them, or write entirely different code if your data requires it. The agent's job is to write correct code for your actual schema, format, and size — not to run a script.

Script Tiers

Not all scripts are equal. Each script is categorized by how it should be used:

Tier	Label	Usage
Callable tool	`CALLABLE TOOL`	Call directly via CLI — standardized arguments, structured output
Scriptable tool	`SCRIPTABLE TOOL`	Call directly for standard use, or read and adapt for advanced customization
Reference implementation	`REFERENCE IMPLEMENTATION`	Always read and adapt — never call directly. May have hard-coded paths or task-specific assumptions

The SKILL.md Reference Scripts table for each skill identifies the tier:

**Callable tools** -- call directly via CLI:
| detect_format.py | Content-sniffing format detection | python3 detect_format.py input output.json |

**Scriptable tools** -- call directly or read + adapt:
| load_file.py | Load any supported format | python3 load_file.py input.csv output.parquet |

**Reference implementations** -- read patterns, write custom code:
| text_parser.py | State-machine text parsing | Two modes; markers are always data-specific |

Why "Read, Don't Run" for Reference Implementations

Reference implementations contain assumptions baked in for the example case — specific column names, known file paths, hardcoded delimiter choices. Running them directly on your data would fail or produce wrong output.

The agent reads the reference script to understand:

What algorithm or approach to use
What edge cases to handle
What the output structure should look like
What library calls produce the right result

Then it writes new code with your actual column names, your actual format, and the constraints specific to your task.

How the Agent Uses Each Layer

A typical task proceeds as:

Identify the skill — match the request against trigger phrases in SKILL.md
Read SKILL.md — absorb domain knowledge, constraints, and thinking patterns
Read relevant reference scripts — understand the implementation approach
Write adapted code — new code for your specific data, not a copy of the script
Load reference docs if needed — only when SKILL.md directs it (e.g., "load references/large_file_patterns.md for files >100MB")
Execute the adapted code — using the agent's Python or bash tools
Validate results — using the patterns from SKILL.md self-healing section

Cross-Cutting Script Flags

Several flags appear across multiple scripts and are worth understanding as a system:

Flag	Effect	Available In
`--auto-checkpoint`	Creates numbered snapshot (`ckpt_NN_*.csv`) after operation	transformation, cleaning, statistical analysis
`--explain`	Outputs JSON execution plan without writing any files	transformation, cleaning, statistical analysis
`--flatten-depth N`	Flattens nested JSON/JSONL fields to depth N	loading (`load_file.py`)

--explain is particularly useful for previewing what a complex operation will do before committing:

# See what the cleaning plan will do without modifying any files
python3 execute_cleaning_plan.py data.csv cleaned.csv --plan plan.json --explain

# See what columns the aggregation will produce
python3 aggregate.py data.csv agg.csv --group_cols region --agg_cols revenue --explain

JSONL Support

All loading, cleaning, and transformation scripts now accept .jsonl (newline-delimited JSON) input natively. Output format is determined by the output file extension:

# Load JSONL and save as Parquet
python3 load_file.py records.jsonl output.parquet

# Load JSONL with nested field flattening
python3 load_file.py records.jsonl output.parquet --flatten-depth 2

Was this page helpful?

On this page