MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Getting Started

How It Works

The Linguistic Agent Skills suite is organized around a 5-phase pipeline that reflects how experienced computational linguists actually approach low-resource language work. The linguistic-orchestrator skill coordinates routing between specialist skills and tracks workspace state in workspace_state.md.

The 5-Phase Pipeline

Scope → Acquire → Analyze → Evaluate → Release
  ↑        ↑         ↑          ↑          ↑
  └────────┴─────────┴──────────┴──────────┘
               (refinement loops)

Phases overlap and loop back. The orchestrator provides the skeleton; specialists own the content.

Phase 0 — Scope

Goal: Identify the target language precisely and set the strategic direction before touching any data.

StepSpecialistWhat It Does
Language disambiguationlinguistic-scopeISO 639-3 + Glottolog resolution; macrolanguage disambiguation
Resource classificationlinguistic-scopeJoshi 0–5 classification; data availability scan
Typological profilinglinguistic-scopeWALS/Grambank/URIEL features; transfer-source recommendation
Script policylinguistic-scriptsUnicode block(s), NFC/NFKC decision, diacritic preservation
Ethics seedlinguistic-ethicsFPIC awareness, vitality-driven community engagement depth

Phase exit: workspace_state.md has ISO code, Joshi class, typology vector, and script policy.

Phase 1 — Acquire

Goal: Gather monolingual and parallel data ethically and reproducibly.

StepSpecialistWhat It Does
Monolingual corporalinguistic-corpusOLDI/CulturaX/MADLAD-400/Glot500/Wikipedia catalog; LID; MinHash dedup
Parallel datalinguistic-bitextLASER3/SONAR mining; Vecalign alignment; synthetic bitext
Tokenizer auditlinguistic-tokenizeFertility ratio; vocab extension method (FOCUS/OFA/HyperOfa)
Adapter strategylinguistic-transferLoRA rank by URIEL distance; MAD-X; catastrophic-forgetting plan
Per-dataset ethicslinguistic-ethicsLicense audit; attribution registry; sacred-text gating

Phase exit: Reproducible data manifest (sources, licenses, sizes, dedup stats) + tokenizer plan.

Phase 2 — Analyze

Goal: Run linguistic analysis layers needed for evaluation, augmentation, or downstream training.

StepSpecialistWhat It Does
Morphologylinguistic-morphUniMorph paradigms; SIGMORPHON segmenters; FST/HFST
Syntaxlinguistic-syntaxUD treebank ingestion; cross-lingual parser transfer; agreement probes
Semanticslinguistic-semanticsWordNet/OMW; FrameNet; PropBank SRL; MWE/PARSEME
Discourselinguistic-discourseRST/PDTB/GUM; coreference; coherence-aware eval
Speechlinguistic-speechELAN/Praat/FLEx → Lhotse; G2P/IPA; MMS/Whisper ASR
Annotationlinguistic-annotateIAA metric selection; guideline authoring; adjudication

Phase exit: Required analysis artifacts produced.

Phase 3 — Evaluate

Goal: Honestly measure performance with metrics fit for the language.

The linguistic-eval skill is A-tier because eval results drive release decisions. It enforces:

  • chrF++/COMET/GEMBA-MQM over BLEU for morphologically-rich languages
  • Per-dialect and per-register breakdowns — aggregate scores hide systematic failures
  • Contamination-aware reporting — FLORES-200 is in many pretrain mixes; report it as a lower bound
  • BLiMP-style grammatical-knowledge probes per language

Phase 4 — Release

linguistic-ethics serves as the release gate — final license compatibility check, attribution registry completeness, community sign-off, and model card authoring. Release modes: Open, Community-gated, or Restricted.

Workspace State

Every session writes structured state to workspace_state.md in the current working directory. This file is the shared memory between specialist skills — scope writes language identity, scripts writes normalization policy, corpus writes the data manifest, and so on. The orchestrator reads it on every invocation to resume seamlessly.

Natural Language vs Slash Commands

Both trigger identical behavior:

  • "help me build an LLM for Yoruba" → orchestrator routes to scope → ethics → corpus → ...
  • /linguistic:lifecycle → same entry point

Slash commands are explicit shortcuts. Natural language works equally well for every operation.

Was this page helpful?
Edit on GitHub

Last updated on

On this page