MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Guides

Workspace Patterns

For Tier 2 (multi-step pipeline) and Tier 3 (multi-dataset project) tasks, MAGIC creates a structured workspace with persistent state files. These files let the agent resume sessions, track decisions, and maintain data provenance across a full pipeline.

Full Directory Structure

workspace/
├── workspace_state.md          # Phase tracker, quality score, task list, last checkpoint
├── specs/
│   └── data-spec.md            # Single source of truth for dataset properties
├── logs/
│   └── analysis_journal.md     # Decision log (timestamp, context, options, chosen, rationale)
├── data/
│   ├── input/                  # Original source files (never modified)
│   ├── processed/              # Current working version
│   └── checkpoints/
│       ├── ckpt_01_loaded.csv
│       ├── ckpt_02_cleaned.csv
│       └── ckpt_03_validated.csv
└── output/
    ├── reports/                # Generated Markdown reports
    └── visualizations/         # Charts and plots (PNG, SVG)

Initialize the workspace by running /magic:init-workspace in your AI assistant, or by asking the agent to set up a workspace.

The workspace root defaults to ./workspace/ relative to your project. Tier 1 tasks (single operations) don't create this structure — they just produce their output directly.


workspace_state.md — Phase Tracker

workspace_state.md is the agent's persistent memory. It is read at the start of every session and updated at the end of each phase.

What It Contains

  • Current phase — which of the 5 lifecycle phases is active (Discover, Plan, Execute, Validate, Deliver)
  • Last checkpoint — path to the most recent data checkpoint
  • Quality score — current dataset quality score (updated after each profiling or cleaning pass)
  • Task list — remaining steps in the current pipeline
  • User preferences — target schema, preferred output format, quality thresholds

Resuming a Session

When you return to a project, the agent reads workspace_state.md and resumes from where you left off:

"Resume my data pipeline"

The agent identifies the last completed step and continues from the next checkpoint.

State files are plain Markdown — you can read and edit them directly. To force a fresh start, delete or clear the state file.


specs/data-spec.md — Single Source of Truth

data-spec.md is the authoritative description of the dataset. It is created during the Plan phase and updated whenever the dataset schema or properties change.

What It Contains

  • Schema — column names, types, nullability, expected ranges
  • Quality targets — minimum quality score, acceptable null rates per column
  • Business rules — cross-column constraints, domain-specific validation rules
  • Source provenance — original file, row/column counts, loading parameters used
  • Processing history — which operations have been applied and in what order

Why It Matters

data-spec.md prevents drift. Without it, each phase may make different assumptions about what the data looks like. With it, the agent checks every operation against the spec and flags deviations.


logs/analysis_journal.md — Decision Log

analysis_journal.md is an append-only log of every significant decision made during the pipeline. It is the first place to look when a pipeline produces unexpected results.

Entry Format

Each entry records:

FieldDescription
TimestampWhen the decision was made
ContextWhat situation prompted this decision
Options consideredWhat alternatives were available
ChosenWhat was decided
RationaleWhy this option was chosen

Sample Entry

## 2026-05-20 14:32 — magic-data-cleaning / null imputation

**Context:** `revenue` column has 342 nulls (2.7%). Missing rate is below 5% threshold.

**Options considered:**
- A. Mean imputation — fast but sensitive to outliers
- B. Median imputation — robust to skew; distribution is right-skewed
- C. KNN imputation — preserves correlations; data has no strong correlated columns
- D. Drop rows — 2.7% loss acceptable but unnecessary

**Chosen:** B — median imputation

**Rationale:** Distribution is right-skewed (skewness 2.4). Median is more representative than mean. No strongly correlated columns to exploit with KNN.

data/checkpoints/ — Versioned Snapshots

Checkpoints are immutable snapshots of the data at each pipeline step. They are never overwritten — each step creates a new file with an incrementing number. This enables rollback to any prior state using /magic:rollback.

Naming Convention

ckpt_{NN}_{operation}.{extension}
PartDescriptionExample
NNTwo-digit step number (zero-padded)01, 02, 12
operationSnake-case description of what was doneloaded, cleaned, validated
extensionFormat matching the datacsv, parquet, jsonl

Examples

ckpt_01_loaded.csv
ckpt_02_profiled.csv          # Data unchanged; metadata written separately
ckpt_03_nulls_imputed.csv
ckpt_04_duplicates_removed.csv
ckpt_05_normalized.parquet    # Format changed for efficiency
ckpt_06_validated.parquet     # Passed validation gate
ckpt_07_report.md             # Final report

--auto-checkpoint Flag

Scripts that support --auto-checkpoint create numbered snapshots automatically after each successful operation — no manual checkpoint management needed:

python3 execute_cleaning_plan.py data.csv cleaned.csv --plan plan.json --auto-checkpoint
# Creates: ckpt_01_cleaned.csv, ckpt_02_cleaned.csv, ... for each step in the plan

Do not rename or move checkpoint files manually. The agent tracks their paths in workspace_state.md. Renaming breaks the rollback chain.


data/input/ — Original Source Files

The input/ directory holds original source files and is never modified. If loading fails or produces unexpected results, the original is always available for re-loading with different parameters.

For Tier 3 projects with multiple source datasets, use subdirectories:

data/input/
├── customers/
│   └── customers_2026.csv
├── transactions/
│   ├── jan.parquet
│   └── feb.parquet
└── hf/
    └── org/dataset-name/   # HuggingFace download output

Multi-Session Workflows

Long pipelines spanning multiple sessions use the workspace naturally:

  1. Session 1 — Load and profile data, write ckpt_01 and ckpt_02, update workspace_state.md
  2. Session 2 — Agent reads state, picks up at Discover→Plan PAUSE gate, continues with cleaning and transformation, writes ckpt_03ckpt_05
  3. Session 3 — Validate, generate visualizations and report, write ckpt_06ckpt_07 to output/

Each session appends entries to analysis_journal.md, so the full decision history is always in one place.

Was this page helpful?
Edit on GitHub

Last updated on

On this page