Architecture
MAGIC Data Agent Skills is not a library, runtime, or API. It is a knowledge delivery system — a collection of structured documents that teach your AI coding assistant how to approach data tasks correctly, consistently, and safely.
System Components
The system has four components that work together:
Skills (SKILL.md files)
Structured knowledge packages the agent reads to understand a domain
Reference Scripts
Seed implementations the agent adapts to your specific data
CLI Installer
Copies skill files into the correct locations for your agent
Workspace
Shared directory structure for checkpoints, logs, and state
How Agents Read Skills
When you ask your AI assistant to perform a data task, it searches for relevant SKILL.md files in the locations the CLI installer populated. The agent reads the skill file and uses it to:
- Determine if the skill applies — by matching your request against the "When to Use" section
- Understand domain rules — what operations are safe, what anti-patterns to avoid, what constraints apply
- Adapt code patterns — reference implementations in the skill are read as templates, not executed directly. The agent writes new code tailored to your data's actual schema, format, and size
- Follow the workflow — the step-by-step procedure guides the agent's sequence of actions
Skills guide the agent's reasoning — they do not limit it. The agent can deviate from the skill's procedure when your situation calls for it, but the domain knowledge and constraints remain in effect.
The 3 Installation Paths
The CLI installer supports three target environments. Each path installs the same SKILL.md files but places them where the target agent can discover them.
Path 1: Claude Code (.claude/skills/)
Installs skills into the .claude/skills/ directory at your project root. Claude Code discovers skill files in this location automatically and loads them when relevant.
your-project/
└── .claude/
└── skills/
├── magic-data-loading/
│ ├── SKILL.md
│ └── scripts/
└── magic-data-cleaning/
├── SKILL.md
└── scripts/Path 2: Cursor (.cursorrules injection)
Injects skill summaries into your .cursorrules file. Cursor reads this file as persistent context for all AI interactions in the project.
Path 3: Global User Directory (~/.magic/skills/)
Installs skills to your home directory so they are available across all projects. Useful when you work with data tasks in many repositories and want skills available everywhere without per-project setup.
Global installation means skills are always in context, which can increase token usage for non-data tasks. Prefer project-level installation unless you work with data in most of your projects.
Skill Discovery and Activation
The agent uses a two-step process to activate the right skill:
Step 1 — Trigger matching. Each SKILL.md lists trigger phrases in its "When to Use" section. The agent scans your prompt for signals that match a skill's domain (e.g. "load", "CSV", "missing values", "distribution").
Step 2 — Confidence scoring. When multiple skills could apply, the agent scores each by relevance and activates the highest-confidence match. For lifecycle tasks it activates magic-data-lifecycle which then routes to specialist skills.
Data Flow
User Prompt
│
▼
Agent reads relevant SKILL.md(s)
│
▼
Lifecycle skill builds route (if multi-skill task)
│
▼
Specialist skill reads reference scripts → writes adapted code
│
▼
Adapted code runs against your actual data
│
▼
Checkpoint written to workspace/data/checkpoints/
│
▼
Journal entry appended to workspace/logs/analysis_journal.md
│
▼
Next skill in route activated (or Deliver phase)At no point does MAGIC execute code directly. The agent reads the skills, writes code, and executes that code using its own tools (Python interpreter, bash, etc.). Skills shape what code gets written — they do not run anything themselves.
5-Phase Workflow with PAUSE Gates
The data lifecycle follows a structured 5-phase workflow. PAUSE gates require explicit user approval before advancing — the agent never moves to the next phase without confirmation.
Discover → [PAUSE: user reviews findings]
→ Plan → [PAUSE: user approves spec]
→ Execute → [PAUSE: user verifies output]
→ Validate → [PAUSE: user reviews compliance]
→ Deliver| Phase | Primary Skills | Output |
|---|---|---|
| Discover | magic-data-loading, magic-data-profiling, magic-data-exploration | Quality score, issue report, patterns |
| Plan | magic-data-lifecycle (routing) | data-spec.md, processing plan |
| Execute | magic-data-cleaning, magic-data-transformation, magic-data-synthesis | Cleaned/transformed checkpoints |
| Validate | magic-data-validation, magic-statistical-analysis | Validation reports, sanity checks |
| Deliver | magic-data-visualization, magic-report-generation | Charts, structured report, exported data |
Tiered Infrastructure
The amount of workspace scaffolding created scales with task complexity:
| Tier | When | What Gets Created |
|---|---|---|
| Tier 1 | Single operation (e.g., "clean these nulls") | Just the result — no workspace files |
| Tier 2 | Multi-step pipeline | workspace_state.md, data-spec.md, analysis_journal.md, checkpoints/ |
| Tier 3 | Multi-dataset projects | Everything in Tier 2 + per-dataset subdirectories, cross-dataset references |
Tier 1 tasks run immediately without setup. Tier 2+ tasks use the full workspace patterns — persistent state, decision logging, and versioned checkpoints.
Quality Gates
The architecture includes a built-in quality gate mechanism. Quality gates are thresholds defined in SKILL.md files (and overridable in the workspace state file) that the agent checks before advancing between pipeline phases:
| Gate | Default Threshold | Checked By |
|---|---|---|
| Profiling score | ≥ 70/100 before cleaning | magic-data-lifecycle |
| Cleaning score | ≥ 85/100 before analysis | magic-data-lifecycle |
| Validation pass rate | 100% for critical constraints | magic-data-validation |
When a gate fails the agent halts, explains which checks failed, and waits for your instruction. This prevents low-quality data from silently flowing into analysis or reports.
Self-Healing
Each skill includes a self-healing section that instructs the agent how to recover from common failures without user intervention:
- Import errors — install missing packages via pip/uv before re-running
- Encoding errors — retry with detected encoding or fall back to latin-1
- Memory errors on large files — switch to chunked reading automatically
- Schema mismatches — surface the mismatch to the user with a suggested fix
Self-healing keeps pipelines running smoothly for well-understood failure modes while escalating genuinely unexpected situations to you.
Last updated on