Architecture

MAGIC Data Agent Skills is not a library, runtime, or API. It is a knowledge delivery system — a collection of structured documents that teach your AI coding assistant how to approach data tasks correctly, consistently, and safely.

System Components

The system has four components that work together:

Skills (SKILL.md files)

Structured knowledge packages the agent reads to understand a domain

Reference Scripts

Seed implementations the agent adapts to your specific data

CLI Installer

Copies skill files into the correct locations for your agent

Workspace

Shared directory structure for checkpoints, logs, and state

How Agents Read Skills

When you ask your AI assistant to perform a data task, it searches for relevant SKILL.md files in the locations the CLI installer populated. The agent reads the skill file and uses it to:

Determine if the skill applies — by matching your request against the "When to Use" section
Understand domain rules — what operations are safe, what anti-patterns to avoid, what constraints apply
Adapt code patterns — reference implementations in the skill are read as templates, not executed directly. The agent writes new code tailored to your data's actual schema, format, and size
Follow the workflow — the step-by-step procedure guides the agent's sequence of actions

Skills guide the agent's reasoning — they do not limit it. The agent can deviate from the skill's procedure when your situation calls for it, but the domain knowledge and constraints remain in effect.

The 3 Installation Paths

The CLI installer supports three target environments. Each path installs the same SKILL.md files but places them where the target agent can discover them.

Path 1: Claude Code (`.claude/skills/`)

Installs skills into the .claude/skills/ directory at your project root. Claude Code discovers skill files in this location automatically and loads them when relevant.

your-project/
└── .claude/
    └── skills/
        ├── magic-data-loading/
        │   ├── SKILL.md
        │   └── scripts/
        └── magic-data-cleaning/
            ├── SKILL.md
            └── scripts/

Path 2: Cursor (`.cursorrules` injection)

Injects skill summaries into your .cursorrules file. Cursor reads this file as persistent context for all AI interactions in the project.

Path 3: Global User Directory (`~/.magic/skills/`)

Installs skills to your home directory so they are available across all projects. Useful when you work with data tasks in many repositories and want skills available everywhere without per-project setup.

Global installation means skills are always in context, which can increase token usage for non-data tasks. Prefer project-level installation unless you work with data in most of your projects.

Skill Discovery and Activation

The agent uses a two-step process to activate the right skill:

Step 1 — Trigger matching. Each SKILL.md lists trigger phrases in its "When to Use" section. The agent scans your prompt for signals that match a skill's domain (e.g. "load", "CSV", "missing values", "distribution").

Step 2 — Confidence scoring. When multiple skills could apply, the agent scores each by relevance and activates the highest-confidence match. For lifecycle tasks it activates magic-data-lifecycle which then routes to specialist skills.

Data Flow

User Prompt
    │
    ▼
Agent reads relevant SKILL.md(s)
    │
    ▼
Lifecycle skill builds route (if multi-skill task)
    │
    ▼
Specialist skill reads reference scripts → writes adapted code
    │
    ▼
Adapted code runs against your actual data
    │
    ▼
Checkpoint written to workspace/data/checkpoints/
    │
    ▼
Journal entry appended to workspace/logs/analysis_journal.md
    │
    ▼
Next skill in route activated (or Deliver phase)

At no point does MAGIC execute code directly. The agent reads the skills, writes code, and executes that code using its own tools (Python interpreter, bash, etc.). Skills shape what code gets written — they do not run anything themselves.

5-Phase Workflow with PAUSE Gates

The data lifecycle follows a structured 5-phase workflow. PAUSE gates require explicit user approval before advancing — the agent never moves to the next phase without confirmation.

Discover → [PAUSE: user reviews findings]
  → Plan → [PAUSE: user approves spec]
    → Execute → [PAUSE: user verifies output]
      → Validate → [PAUSE: user reviews compliance]
        → Deliver

Phase	Primary Skills	Output
Discover	magic-data-loading, magic-data-profiling, magic-data-exploration	Quality score, issue report, patterns
Plan	magic-data-lifecycle (routing)	`data-spec.md`, processing plan
Execute	magic-data-cleaning, magic-data-transformation, magic-data-synthesis	Cleaned/transformed checkpoints
Validate	magic-data-validation, magic-statistical-analysis	Validation reports, sanity checks
Deliver	magic-data-visualization, magic-report-generation	Charts, structured report, exported data

Tiered Infrastructure

The amount of workspace scaffolding created scales with task complexity:

Tier	When	What Gets Created
Tier 1	Single operation (e.g., "clean these nulls")	Just the result — no workspace files
Tier 2	Multi-step pipeline	`workspace_state.md`, `data-spec.md`, `analysis_journal.md`, `checkpoints/`
Tier 3	Multi-dataset projects	Everything in Tier 2 + per-dataset subdirectories, cross-dataset references

Tier 1 tasks run immediately without setup. Tier 2+ tasks use the full workspace patterns — persistent state, decision logging, and versioned checkpoints.

Quality Gates

The architecture includes a built-in quality gate mechanism. Quality gates are thresholds defined in SKILL.md files (and overridable in the workspace state file) that the agent checks before advancing between pipeline phases:

Gate	Default Threshold	Checked By
Profiling score	≥ 70/100 before cleaning	`magic-data-lifecycle`
Cleaning score	≥ 85/100 before analysis	`magic-data-lifecycle`
Validation pass rate	100% for critical constraints	`magic-data-validation`

When a gate fails the agent halts, explains which checks failed, and waits for your instruction. This prevents low-quality data from silently flowing into analysis or reports.

Self-Healing

Each skill includes a self-healing section that instructs the agent how to recover from common failures without user intervention:

Import errors — install missing packages via pip/uv before re-running
Encoding errors — retry with detected encoding or fall back to latin-1
Memory errors on large files — switch to chunked reading automatically
Schema mismatches — surface the mismatch to the user with a suggested fix

Self-healing keeps pipelines running smoothly for well-understood failure modes while escalating genuinely unexpected situations to you.

Was this page helpful?

Skills (SKILL.md files)

Reference Scripts

CLI Installer

Workspace

On this page