Workspace Patterns
For Tier 2 (multi-step pipeline) and Tier 3 (multi-dataset project) tasks, MAGIC creates a structured workspace with persistent state files. These files let the agent resume sessions, track decisions, and maintain data provenance across a full pipeline.
Full Directory Structure
workspace/
├── workspace_state.md # Phase tracker, quality score, task list, last checkpoint
├── specs/
│ └── data-spec.md # Single source of truth for dataset properties
├── logs/
│ └── analysis_journal.md # Decision log (timestamp, context, options, chosen, rationale)
├── data/
│ ├── input/ # Original source files (never modified)
│ ├── processed/ # Current working version
│ └── checkpoints/
│ ├── ckpt_01_loaded.csv
│ ├── ckpt_02_cleaned.csv
│ └── ckpt_03_validated.csv
└── output/
├── reports/ # Generated Markdown reports
└── visualizations/ # Charts and plots (PNG, SVG)Initialize the workspace by running /magic:init-workspace in your AI assistant, or by asking the agent to set up a workspace.
The workspace root defaults to ./workspace/ relative to your project. Tier 1 tasks (single operations) don't create this structure — they just produce their output directly.
workspace_state.md — Phase Tracker
workspace_state.md is the agent's persistent memory. It is read at the start of every session and updated at the end of each phase.
What It Contains
- Current phase — which of the 5 lifecycle phases is active (Discover, Plan, Execute, Validate, Deliver)
- Last checkpoint — path to the most recent data checkpoint
- Quality score — current dataset quality score (updated after each profiling or cleaning pass)
- Task list — remaining steps in the current pipeline
- User preferences — target schema, preferred output format, quality thresholds
Resuming a Session
When you return to a project, the agent reads workspace_state.md and resumes from where you left off:
"Resume my data pipeline"
The agent identifies the last completed step and continues from the next checkpoint.
State files are plain Markdown — you can read and edit them directly. To force a fresh start, delete or clear the state file.
specs/data-spec.md — Single Source of Truth
data-spec.md is the authoritative description of the dataset. It is created during the Plan phase and updated whenever the dataset schema or properties change.
What It Contains
- Schema — column names, types, nullability, expected ranges
- Quality targets — minimum quality score, acceptable null rates per column
- Business rules — cross-column constraints, domain-specific validation rules
- Source provenance — original file, row/column counts, loading parameters used
- Processing history — which operations have been applied and in what order
Why It Matters
data-spec.md prevents drift. Without it, each phase may make different assumptions about what the data looks like. With it, the agent checks every operation against the spec and flags deviations.
logs/analysis_journal.md — Decision Log
analysis_journal.md is an append-only log of every significant decision made during the pipeline. It is the first place to look when a pipeline produces unexpected results.
Entry Format
Each entry records:
| Field | Description |
|---|---|
| Timestamp | When the decision was made |
| Context | What situation prompted this decision |
| Options considered | What alternatives were available |
| Chosen | What was decided |
| Rationale | Why this option was chosen |
Sample Entry
## 2026-05-20 14:32 — magic-data-cleaning / null imputation
**Context:** `revenue` column has 342 nulls (2.7%). Missing rate is below 5% threshold.
**Options considered:**
- A. Mean imputation — fast but sensitive to outliers
- B. Median imputation — robust to skew; distribution is right-skewed
- C. KNN imputation — preserves correlations; data has no strong correlated columns
- D. Drop rows — 2.7% loss acceptable but unnecessary
**Chosen:** B — median imputation
**Rationale:** Distribution is right-skewed (skewness 2.4). Median is more representative than mean. No strongly correlated columns to exploit with KNN.data/checkpoints/ — Versioned Snapshots
Checkpoints are immutable snapshots of the data at each pipeline step. They are never overwritten — each step creates a new file with an incrementing number. This enables rollback to any prior state using /magic:rollback.
Naming Convention
ckpt_{NN}_{operation}.{extension}| Part | Description | Example |
|---|---|---|
NN | Two-digit step number (zero-padded) | 01, 02, 12 |
operation | Snake-case description of what was done | loaded, cleaned, validated |
extension | Format matching the data | csv, parquet, jsonl |
Examples
ckpt_01_loaded.csv
ckpt_02_profiled.csv # Data unchanged; metadata written separately
ckpt_03_nulls_imputed.csv
ckpt_04_duplicates_removed.csv
ckpt_05_normalized.parquet # Format changed for efficiency
ckpt_06_validated.parquet # Passed validation gate
ckpt_07_report.md # Final report--auto-checkpoint Flag
Scripts that support --auto-checkpoint create numbered snapshots automatically after each successful operation — no manual checkpoint management needed:
python3 execute_cleaning_plan.py data.csv cleaned.csv --plan plan.json --auto-checkpoint
# Creates: ckpt_01_cleaned.csv, ckpt_02_cleaned.csv, ... for each step in the planDo not rename or move checkpoint files manually. The agent tracks their paths in workspace_state.md. Renaming breaks the rollback chain.
data/input/ — Original Source Files
The input/ directory holds original source files and is never modified. If loading fails or produces unexpected results, the original is always available for re-loading with different parameters.
For Tier 3 projects with multiple source datasets, use subdirectories:
data/input/
├── customers/
│ └── customers_2026.csv
├── transactions/
│ ├── jan.parquet
│ └── feb.parquet
└── hf/
└── org/dataset-name/ # HuggingFace download outputMulti-Session Workflows
Long pipelines spanning multiple sessions use the workspace naturally:
- Session 1 — Load and profile data, write
ckpt_01andckpt_02, updateworkspace_state.md - Session 2 — Agent reads state, picks up at Discover→Plan PAUSE gate, continues with cleaning and transformation, writes
ckpt_03–ckpt_05 - Session 3 — Validate, generate visualizations and report, write
ckpt_06–ckpt_07tooutput/
Each session appends entries to analysis_journal.md, so the full decision history is always in one place.
Last updated on