Load and Profile a CSV
In this tutorial you'll watch the MAGIC Data Agent load a CSV file and run a full data-quality profile on it. The agent automatically detects the file format, loads it into memory, and then routes to the profiling skill to score quality and surface critical issues — all from a single natural-language prompt.
This is the recommended starting point if you're new to MAGIC Data Agent Skills. It uses two skills (magic-data-loading and magic-data-profiling) and takes under two minutes end-to-end.
Setup
Follow these steps to set up your environment. If you've already installed, skip to the sample data.
Install the skills
npx @votee-ai/magic-data-agent-skills initCreate a workspace
mkdir magic-tutorial && cd magic-tutorialDownload sample data
curl -O https://docs.votee.ai/tutorials/data/sales_data.csvSample Data
This tutorial uses a 515-row sales dataset with intentional quality issues — mixed date formats, duplicate rows, missing values, and extreme outliers. Download it to follow along:
The Agent Trace
The interactive viewer below shows every step the agent took — from receiving your prompt to delivering the quality report. Click any step to expand its details, or use ← → arrow keys to navigate.
Load data/sales_data.csv and tell me about the data quality issues
User prompt
magic-data-loadingSkillloadcsv00:03
Detected 'Load' keyword with a file path reference — the agent invokes the skill, loading 21,739 characters of domain knowledge about format detection, encoding handling, and loading strategies
file inspection + CSV loadExecution00:06
Input
{
"file_path": "data/sales_data.csv"
}Output
{
"rows": 515,
"columns": 12,
"encoding": "UTF-8",
"format": "CSV"
}wc -l data/sales_data.csv && file data/sales_data.csv
magic-data-profilingSkillqualityissues00:13
magic-data-loading → magic-data-profiling— User asked about 'quality issues' — data loaded successfully (515 rows × 12 columns). Activating profiling skill for comprehensive analysis.
Loaded profiling skill with quality scoring patterns, distribution analysis methods, and outlier detection strategies
comprehensive quality profilingExecution00:18
Input
{
"dataframe_shape": [
515,
12
]
}Output
{
"quality_score": 86.9,
"grade": "B",
"duplicates": 15,
"date_formats": 3,
"quantity_outliers": 5,
"missing_email_pct": "13.0%",
"missing_phone_pct": "8.2%",
"missing_notes_pct": "89.7%",
"revenue_consistency": "100%",
"email_shared_across_reps": true
}import pandas as pd
import numpy as np
import re
df = pd.read_csv('data/sales_data.csv')
# Quality dimensions: completeness, uniqueness,
# consistency (date formats, email mapping),
# validity (outliers, revenue check)date format & outlier deep-diveExecution00:25
Input
{
"columns": [
"date",
"quantity"
]
}Output
{
"date_formats": {
"YYYY-MM-DD": 410,
"MM/DD/YYYY": 69,
"DD-Mon-YYYY": 36
},
"quantity_outliers": [
{
"order_id": "ORD-1389",
"quantity": 15862,
"revenue": "$1.6M"
},
{
"order_id": "ORD-1444",
"quantity": 12353,
"revenue": "$1.7M"
},
{
"order_id": "ORD-1256",
"quantity": 11581
},
{
"order_id": "ORD-1042",
"quantity": 9262
},
{
"order_id": "ORD-1137",
"quantity": 5741
}
]
}# Date format detection
mask_iso = df['date'].str.match(r'^\d{4}-\d{2}-\d{2}$')
mask_us = df['date'].str.match(r'^\d{2}/\d{2}/\d{4}$')
mask_dmy = df['date'].str.match(r'^\d{2}-[A-Z][a-z]{2}-\d{4}$')Results deliveredOutput00:32
Data Quality Report: `sales_data.csv`
Loaded: 515 rows × 12 columns | Overall quality: 86.9/100 (Grade B) — but the aggregate masks serious column-level issues.
Critical issues
ORD-1072, ORD-1066). Breaks joins and aggregations if order_id is a primary key.MM/DD/YYYY, 36 rows DD-Mon-YYYY. All parseable, but downstream sort/filter will misfire without normalization.Moderate issues
Clean dimensions
region (5), channel (4), product (10), sales_rep (10) — all cleanrevenue = quantity × unit_price holds for all 515 rowsArtifacts
sales_data.csv
515 rows × 12 columns — sales transactions with quality issues
Data Flow
| Step | Skill | Input | Output | Key Config |
|---|---|---|---|---|
| 1 | magic-data-loading | sales_data.csv (raw file) | DataFrame (515×12) | encoding: auto-detect |
| 2 | magic-data-profiling | DataFrame (515×12) | Quality report (86.9/100, Grade B) | method: inline (small dataset) |
What Happened
Skill Activation
The agent detected "Load" and "CSV" in the prompt and activated magic-data-loading. It read the skill's SKILL.md to understand format detection, encoding handling, and checkpoint patterns.
Data Loading
The agent loaded sales_data.csv into pandas — 515 rows × 12 columns. It immediately inspected shape, dtypes, and null counts to plan the next step.
Automatic Transition to Profiling
Because the prompt said "quality issues", the agent activated magic-data-profiling, transitioning from the loading skill without any extra instruction. It read the profiling skill's SKILL.md to configure the inline analysis method appropriate for this dataset size.
Comprehensive Profiling
The agent ran five analysis passes:
- Date format detection — found 3 formats via regex matching
- Missing value analysis — email (13%), phone (8.2%), notes (89.7%)
- Email cross-mapping — discovered emails shared across 6–9 reps
- Duplicate detection — found 15 exact duplicate rows
- Outlier detection — 5 extreme quantity values using the IQR method
Quality Report
The agent delivered a severity-rated report with 7 issues (3 High, 3 Medium, 1 Low) and recommended cleaning steps, scoring the dataset at 86.9/100 (Grade B).
Key Issues Found
| # | Severity | Issue | Impact |
|---|---|---|---|
| 1 | High | 15 exact duplicate rows | Inflates aggregations by 3% |
| 2 | High | 3 inconsistent date formats | Breaks time-series analysis |
| 3 | High | Emails shared across all reps | Cannot use as identifier |
| 4 | Medium | 5 extreme quantity outliers (5,741–15,862) | Skews averages by 10x |
| 5 | Medium | 67 missing emails (13%) | Gaps in contact data |
| 6 | Medium | 42 missing phones (8.2%) | Gaps in contact data |
| 7 | Low | Notes column 90% empty, rest is placeholder | No analytical value |
Try It Yourself
Copy this prompt and paste it into your MAGIC session:
Load data/sales_data.csv and tell me about the data quality issuesThe agent will:
- Detect the file format and encoding
- Load the data into memory
- Transition automatically to the profiling skill
- Return a quality score with a prioritized list of issues
What You Learned
- The agent automatically selects skills based on keywords in your prompt — you don't need to name skills explicitly.
- Skill transitions happen seamlessly:
magic-data-loadinghands off tomagic-data-profilingwithout any extra instruction from you. - The agent runs multiple analysis passes (dates, nulls, duplicates, outliers, cross-column validation) in a single profiling session.
- Missing values, duplicates, and outliers are all surfaced in a single pass with severity ratings to help you prioritize.
Alternate Approaches
Next Steps
Generate a Quick Report
Turn those profiling results into a structured 10-section markdown report
Create Basic Visualizations
Generate charts to visualize distributions and category breakdowns
Last updated on