MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Skills Reference

magic-data-transformation

Transform data by reshaping, aggregating, merging, deriving columns, and delivering to external destinations (database, HuggingFace Hub). Use when: (1) pivoting, melting, or unpivoting tables, (2) grouping and aggregating data, (3) joining or merging multiple datasets, (4) creating calculated or derived columns, (5) uploading/delivering/pushing data to HuggingFace Hub or database. Trigger keywords: pivot, melt, reshape, groupby, aggregate, merge, join, vlookup, deliver, upload, HuggingFace, push to Hub.

When It Activates

Use this skill when reshaping, joining, or deriving data. Trigger phrases: transform, reshape, pivot, melt, merge, join, aggregate, group by, derive column, split dataset, convert format, instruction tuning.

  • Need to pivot, melt, stack, or unstack data
  • Need group-by aggregations
  • Need to merge/join multiple datasets
  • Need to create calculated or derived columns
  • After magic-data-cleaning, before analysis

When NOT to Use: Use magic-data-cleaning for quality fixes; use magic-data-exploration for analysis.

Quick Facts

PropertyValue
Version2.0.0
Complexitymedium
Phase1
Scripts7

Tags

data-science transformation reshape aggregate merge join

Scripts

Callable Tools (call directly via CLI)

ScriptPurposeExample
deliver_to_db.pyWrite transformed data to a database tablepython3 deliver_to_db.py --input data.parquet --table target_table
deliver_to_hf.pyPublish dataset to HuggingFace Hubpython3 deliver_to_hf.py --input dataset_folder/ --repo org/repo-name

Scriptable Tools (call directly or read + adapt)

ScriptStandard CLI UsageWhen to Customize
validate_transform.pypython3 validate_transform.py original.csv transformed.csv report.csv--expected-shape rows,cols for dimensional assertion; --key-columns id,date to verify key preservation
aggregate.pypython3 aggregate.py data.csv agg.csv --group_cols region --agg_cols revenue --functions mean,sum,count--explain for dry-run
merge_datasets.pypython3 merge_datasets.py left.csv right.csv merged.csv --on customer_id --how left--left-on/--right-on when key names differ
reshape.pypython3 reshape.py data.csv reshaped.csv --operation pivot --index_col date --columns_col region --values_col revenue--operation stack|unstack needs only input/output (no column params)

Reference Implementations (read patterns, write custom code)

ScriptDemonstratesKey Pattern
derive_columns.pySafe expression evaluation for computed columnsSafe pd.eval() sandbox; blocked unsafe patterns; --expressions is effectively required

New in v2.0.0

--auto-checkpoint and --explain Flags

aggregate.py, merge_datasets.py, and derive_columns.py support:

  • --explain — prints a JSON execution plan without writing any files. Use to preview what the operation will do before committing.
  • --auto-checkpoint — creates a numbered snapshot (ckpt_NN_*.csv) after each successful operation.
# Preview an aggregation without writing output
python3 aggregate.py data.csv agg.csv --group_cols region --agg_cols revenue --functions sum --explain

# Run with automatic versioned checkpoints
python3 merge_datasets.py left.csv right.csv merged.csv --on id --auto-checkpoint

JSONL Support

All transformation scripts now accept .jsonl input natively. Output format is determined by the output file extension.

Dependencies

pandas numpy

Was this page helpful?
Edit on GitHub

Last updated on

On this page