magic-data-exploration
Explore data interactively and detect patterns systematically. Use when investigating a dataset — freely exploring quality issues, comparing segments, discovering correlations, or running automated pattern detection. Covers both interactive investigation (asking questions, following threads) and scripted analysis (pattern detection, segment comparison, relationship exploration).
When It Activates
Use this skill when investigating data patterns or comparing segments. Trigger phrases: explore, investigate, patterns, what patterns, look into, understand data, compare groups, segment analysis, find templates, similarity.
- User wants to investigate data interactively before committing to a processing plan
- User wants to understand quality issues, patterns, or structure
- Need to discover patterns and insights using automated scripts
- Need to compare statistics across groups/segments
- Need to explore pairwise relationships between columns
- After magic-data-profiling, for deeper systematic investigation
When NOT to Use: Use magic-data-profiling for initial quality scoring and distribution overview. Use magic-data-cleaning for applying fixes. Use magic-statistical-analysis for formal hypothesis testing. Use magic-data-lifecycle for full multi-step processing.
Quick Facts
| Property | Value |
|---|---|
| Version | 2.0.0 |
| Complexity | medium |
| Phase | 1 |
| Scripts | 4 |
Tags
data-science exploration patterns segments relationships interactive discovery
Scripts
Scriptable Tools (call directly or read + adapt)
| Script | Standard CLI Usage | When to Customize |
|---|---|---|
detect_patterns.py | python3 detect_patterns.py data.csv patterns.csv | --max-findings 20 for broader coverage. 6 detectors: temporal cycle, categorical imbalance, numeric cluster, text pattern, outlier presence, correlation |
prepare_for_exploration.py | python3 prepare_for_exploration.py data.csv prepared.csv | --columns col1,col2 to restrict; --derive '{"new_col": "src:expression"}' for custom derives |
relationship_explorer.py | python3 relationship_explorer.py data.csv relationships.csv | --columns col1,col2 to restrict; --max-pairs 20 for wider coverage. Produces PNG charts in {stem}_charts/ |
segment_analysis.py | python3 segment_analysis.py data.csv segments.csv | --group_col col when auto-detect picks wrong column; --value_cols col1,col2 to narrow metrics |
New in v2.0.0
prepare_for_exploration.py — Text Column Enrichment
prepare_for_exploration.py enriches a CSV with numeric representations of text columns, enabling exploration scripts to operate on text-heavy datasets. For each text column it automatically derives {col}_length, {col}_word_count, and {col}_is_present.
Run this before detect_patterns.py or relationship_explorer.py on text-only datasets — exploration scripts require at least some numeric or categorical columns to function.
python3 prepare_for_exploration.py data.csv prepared.csv
# Restrict to specific columns
python3 prepare_for_exploration.py data.csv prepared.csv --columns title,bodyDependencies
pandas numpy scipy matplotlib seaborn
Last updated on
magic-data-validation
Validate datasets against inferred or custom schemas, check cross-column constraints, detect sentinel/placeholder values, and catch statistical pitfalls (Simpson's paradox, join explosion). Use when verifying data quality after cleaning, enforcing schemas before delivery, checking for content placeholders, or sanity-checking transformation results.
magic-statistical-analysis
Perform descriptive statistics, hypothesis testing, and correlation analysis with mandatory uncertainty communication. Use when computing statistics, testing hypotheses, comparing groups, or analyzing correlations with significance.