MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Getting Started

Quick Start

This walkthrough takes you through your first language analysis using the Linguistic Agent Skills suite. You'll scope a low-resource language, get its typological profile, and establish a script policy — the foundation for any LLM project.

Prerequisites

Complete Installation first. You need Claude Code running with the skills and commands linked.

Step 1 — Start a Pipeline Session

Open a Claude Code session in your project directory and type:

help me build an LLM for Yoruba

The linguistic-orchestrator activates automatically. It creates workspace_state.md in your current directory and routes to linguistic-scope.

Alternatively, use the explicit slash command:

/linguistic:lifecycle

Step 2 — Language Disambiguation

linguistic-scope immediately runs disambiguation. Yoruba (yor) is an unambiguous ISO 639-3 code, so it proceeds directly. If you had said "Chinese" or "Arabic", it would pause and present subtags as numbered options — macrolanguage disambiguation is mandatory before any data work begins.

Expected output:

ISO 639-3: yor
Glottolog: yoru1245
Family: Niger-Congo > Atlantic-Congo > Volta-Congo > Benue-Congo > Yoruboid
Default script: Latin (with tonal diacritics)

Step 3 — Resource Classification

The skill computes the Joshi 0–5 resource class from cached signals: Wikipedia presence, OPUS presence, FLORES-200 inclusion, NLLB inclusion, and HuggingFace dataset count.

Expected output for Yoruba:

Resource class (Joshi): 2 — "Hopefuls"
  Some labelled data, no full benchmark coverage.
  Recommended strategy: Vocab extension + LoRA.
  Transfer source candidates: Igbo (0.18), Hausa (0.34), Swahili (0.41)
  English distance: 0.62 — NOT recommended as primary transfer source.

This class changes every downstream decision — tokenizer strategy, eval suite, adapter rank, ethics depth. The orchestrator records it in workspace_state.md.

Step 4 — Typological Profile

The skill surfaces outlier features from the URIEL vector that require targeted handling:

Typological outliers for Yoruba (yor):
  • Tone: HIGH/LOW tone (á/à), nasal (ọ̀) — diacritic preservation MANDATORY
  • SVO word order: standard
  • Isolating morphology: low fertility expected (~1.8-2.2×)
  • No grammatical gender
  • Subject-verb agreement: limited

Step 5 — Script Policy (linguistic-scripts)

The orchestrator routes to linguistic-scripts to set normalization policy:

Script Policy for Yoruba:
  Primary script: Latin (Unicode block U+0000-U+007F + combining diacritics)
  Normalization: NFC (default)
  Diacritics: PRESERVE — tone language (stripping = data corruption)
  Romanization: N/A (already Latin-script)
  Confusable risk: LOW
  ZWJ/ZWNJ: NORMALIZE

This policy is recorded in workspace_state.md and applies to every downstream step.

Step 6 — Ethics Seed (linguistic-ethics)

Before any data recommendation, linguistic-ethics runs an early gate:

Ethics seed for Yoruba:
  Vitality (EGIDS): 2 — Provincial (widely spoken in Nigeria)
  Ethics depth: Standard FPIC + license check
  Community engagement: Standard (no mandatory pre-engagement for this vitality level)
  Sacred-text flag: None identified

Step 7 — Review the Workspace State

At any point, check where you are:

/linguistic:status

Output:

[Phase: Scope | Language: Yoruba (yor) | Resource Class: 2 | Skills routed: scope, scripts, ethics | Open findings: 1]

For a full review:

/linguistic:review

Step 8 — Next Steps

With Scope complete, you're ready for the Acquire phase. The orchestrator will suggest:

/linguistic:propose

This generates a full plan.md covering all 5 phases — corpus sources (with ethics flags), tokenizer strategy, adapter plan, eval suite, and release gating — all tailored to Yoruba Class 2.

What Just Happened

In these 8 steps, the suite automatically:

  1. Resolved the language to a canonical ISO 639-3 + Glottolog identifier
  2. Classified it as Joshi Class 2 with typology-informed strategy
  3. Identified that English is a poor transfer source (URIEL distance 0.62) and recommended Igbo as the primary
  4. Set a script policy that protects tone diacritics from being stripped
  5. Ran an ethics seed that gates all future data recommendations
  6. Wrote structured state to workspace_state.md for cross-session continuity

All of this from a single natural-language prompt — "help me build an LLM for Yoruba".

Was this page helpful?
Edit on GitHub

Last updated on

On this page