MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Skills Reference

magic-data-profiling

Profile datasets — run quality scoring, distribution analysis, outlier detection, and issue detection. Use when assessing data quality, getting a quality overview, or profiling before cleaning.

When It Activates

Use this skill when the user wants to understand data quality, distributions, or characteristics. Trigger phrases: profile, quality, check quality, assess, what types, categorize, classify, outliers, distributions, summarize data.

  • Need to understand data characteristics before cleaning or analysis
  • Need distribution analysis (skewness, normality tests)
  • Need to detect outliers or assess data quality
  • Need correlation analysis with significance testing
  • Need to discover categorical groupings or classify value types

When NOT to Use: Use magic-statistical-analysis for hypothesis testing; use magic-data-exploration for pattern discovery. Data is already profiled — re-profile only after transformations.

Quick Facts

PropertyValue
Version2.0.0
Complexitymedium
Phase1
Scripts8

Tags

data-science profiling statistics quality eda

Scripts

Scriptable Tools (call directly or read + adapt)

ScriptStandard CLI UsageWhen to Customize
quality_score.pypython3 quality_score.py data.parquet logs/quality.jsonCustom dimension weights, additional dimensions, domain-specific thresholds
detect_all_issues.pypython3 detect_all_issues.py data.parquet report.json--include-content-validation for sentinel checks; --sentinel-patterns for custom list
distribution_analysis.pypython3 distribution_analysis.py data.csv dist.json--columns col1,col2 to limit scope on wide datasets
outlier_detection.pypython3 outlier_detection.py data.csv outliers.json--method zscore --threshold 3.0 for normal data; --method both for dual detection
correlation_matrix.pypython3 correlation_matrix.py data.csv corr.json--method pearson|spearman to override auto; --columns to limit scope. Outputs JSON + CSV matrix + PNG heatmap
deep_quality_analysis.pypython3 deep_quality_analysis.py data.csv analysis.json--depth deep for full investigation; --columns for targeted analysis; --sample N for large datasets
detect_categories.pypython3 detect_categories.py --input data.csv --output cats.json--column name to override auto-selection; --method tfidf_kmeans to force clustering
classify_answers.pypython3 classify_answers.py --input data.csv --output classify.json--column col when auto-selection picks wrong column; --sample N for large datasets

New in v2.0.0

detect_all_issues.py — Combined Meta-Profiler

detect_all_issues.py runs quality, distribution, outlier, and correlation analysis in a single pass, producing one JSON report with nested sub-analyses: {sentinels, quality, distributions, outliers, correlations, categories, answer_classification, errors}.

Use this instead of running each analysis script individually.

python3 detect_all_issues.py data.parquet report.json

# Include sentinel/placeholder detection
python3 detect_all_issues.py data.parquet report.json --include-content-validation

Do not run detect_all_issues.py on datasets over 1M rows without sampling first — it runs 6 sub-analyses sequentially, which can take 30+ minutes and risk OOM on correlation heatmaps.

Dependencies

pandas numpy scipy matplotlib seaborn

Was this page helpful?
Edit on GitHub

Last updated on

On this page