MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Concepts

Architecture

MAGIC Data Agent Skills is not a library, runtime, or API. It is a knowledge delivery system — a collection of structured documents that teach your AI coding assistant how to approach data tasks correctly, consistently, and safely.

System Components

The system has four components that work together:


How Agents Read Skills

When you ask your AI assistant to perform a data task, it searches for relevant SKILL.md files in the locations the CLI installer populated. The agent reads the skill file and uses it to:

  1. Determine if the skill applies — by matching your request against the "When to Use" section
  2. Understand domain rules — what operations are safe, what anti-patterns to avoid, what constraints apply
  3. Adapt code patterns — reference implementations in the skill are read as templates, not executed directly. The agent writes new code tailored to your data's actual schema, format, and size
  4. Follow the workflow — the step-by-step procedure guides the agent's sequence of actions

Skills guide the agent's reasoning — they do not limit it. The agent can deviate from the skill's procedure when your situation calls for it, but the domain knowledge and constraints remain in effect.


The 3 Installation Paths

The CLI installer supports three target environments. Each path installs the same SKILL.md files but places them where the target agent can discover them.

Path 1: Claude Code (.claude/skills/)

Installs skills into the .claude/skills/ directory at your project root. Claude Code discovers skill files in this location automatically and loads them when relevant.

your-project/
└── .claude/
    └── skills/
        ├── magic-data-loading/
        │   ├── SKILL.md
        │   └── scripts/
        └── magic-data-cleaning/
            ├── SKILL.md
            └── scripts/

Path 2: Cursor (.cursorrules injection)

Injects skill summaries into your .cursorrules file. Cursor reads this file as persistent context for all AI interactions in the project.

Path 3: Global User Directory (~/.magic/skills/)

Installs skills to your home directory so they are available across all projects. Useful when you work with data tasks in many repositories and want skills available everywhere without per-project setup.

Global installation means skills are always in context, which can increase token usage for non-data tasks. Prefer project-level installation unless you work with data in most of your projects.


Skill Discovery and Activation

The agent uses a two-step process to activate the right skill:

Step 1 — Trigger matching. Each SKILL.md lists trigger phrases in its "When to Use" section. The agent scans your prompt for signals that match a skill's domain (e.g. "load", "CSV", "missing values", "distribution").

Step 2 — Confidence scoring. When multiple skills could apply, the agent scores each by relevance and activates the highest-confidence match. For lifecycle tasks it activates magic-data-lifecycle which then routes to specialist skills.


Data Flow

User Prompt


Agent reads relevant SKILL.md(s)


Lifecycle skill builds route (if multi-skill task)


Specialist skill reads reference scripts → writes adapted code


Adapted code runs against your actual data


Checkpoint written to workspace/data/checkpoints/


Journal entry appended to workspace/logs/analysis_journal.md


Next skill in route activated (or Deliver phase)

At no point does MAGIC execute code directly. The agent reads the skills, writes code, and executes that code using its own tools (Python interpreter, bash, etc.). Skills shape what code gets written — they do not run anything themselves.


5-Phase Workflow with PAUSE Gates

The data lifecycle follows a structured 5-phase workflow. PAUSE gates require explicit user approval before advancing — the agent never moves to the next phase without confirmation.

Discover → [PAUSE: user reviews findings]
  → Plan → [PAUSE: user approves spec]
    → Execute → [PAUSE: user verifies output]
      → Validate → [PAUSE: user reviews compliance]
        → Deliver
PhasePrimary SkillsOutput
Discovermagic-data-loading, magic-data-profiling, magic-data-explorationQuality score, issue report, patterns
Planmagic-data-lifecycle (routing)data-spec.md, processing plan
Executemagic-data-cleaning, magic-data-transformation, magic-data-synthesisCleaned/transformed checkpoints
Validatemagic-data-validation, magic-statistical-analysisValidation reports, sanity checks
Delivermagic-data-visualization, magic-report-generationCharts, structured report, exported data

Tiered Infrastructure

The amount of workspace scaffolding created scales with task complexity:

TierWhenWhat Gets Created
Tier 1Single operation (e.g., "clean these nulls")Just the result — no workspace files
Tier 2Multi-step pipelineworkspace_state.md, data-spec.md, analysis_journal.md, checkpoints/
Tier 3Multi-dataset projectsEverything in Tier 2 + per-dataset subdirectories, cross-dataset references

Tier 1 tasks run immediately without setup. Tier 2+ tasks use the full workspace patterns — persistent state, decision logging, and versioned checkpoints.


Quality Gates

The architecture includes a built-in quality gate mechanism. Quality gates are thresholds defined in SKILL.md files (and overridable in the workspace state file) that the agent checks before advancing between pipeline phases:

GateDefault ThresholdChecked By
Profiling score≥ 70/100 before cleaningmagic-data-lifecycle
Cleaning score≥ 85/100 before analysismagic-data-lifecycle
Validation pass rate100% for critical constraintsmagic-data-validation

When a gate fails the agent halts, explains which checks failed, and waits for your instruction. This prevents low-quality data from silently flowing into analysis or reports.


Self-Healing

Each skill includes a self-healing section that instructs the agent how to recover from common failures without user intervention:

  • Import errors — install missing packages via pip/uv before re-running
  • Encoding errors — retry with detected encoding or fall back to latin-1
  • Memory errors on large files — switch to chunked reading automatically
  • Schema mismatches — surface the mismatch to the user with a suggested fix

Self-healing keeps pipelines running smoothly for well-understood failure modes while escalating genuinely unexpected situations to you.

Was this page helpful?
Edit on GitHub

Last updated on

On this page