MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
TutorialsEasy

Tutorial: Explore a Language Profile

Difficulty: Easy
Time: 20–30 minutes
Skills used: linguistic-scope, linguistic-scripts
Language: Twi (twi) — Akan dialect, Ghana, Joshi Class 2

This tutorial walks through the Scope phase for a new language target. By the end you'll have a complete language profile in workspace_state.md and understand what it means for your downstream decisions.

Prerequisites

  • Linguistic Agent Skills installed (see Installation)
  • A Claude Code session open in a new project directory

Step 1 — Trigger the Scope Phase

In your Claude Code session, type:

help me understand what I need to know about Twi for NLP

The linguistic-orchestrator activates. It creates workspace_state.md and routes to linguistic-scope.

You can also use the explicit command:

/linguistic:lifecycle

Then enter "Twi" when prompted for a language.

Step 2 — Language Disambiguation

linguistic-scope first checks whether "Twi" is ambiguous. Twi is part of the Akan dialect continuum (with Fante and Asante as the main varieties). The skill surfaces this:

"Twi" may refer to:
  1. Asante Twi (ISO: twi) — most common; Asante region, Ghana
  2. Akuapem Twi (ISO: twi, dialect) — Akuapem region, Ghana
  3. Fante (ISO: fat) — coastal Ghana; related but distinct

Proceeding with Asante Twi (twi) as default. Confirm or specify a variant.

Confirm option 1 for this tutorial.

Expected output:

ISO 639-3: twi
Glottolog: akan1250 (Akan)
Family: Niger-Congo > Atlantic-Congo > Kwa > Akan
Default script: Latin (with tone diacritics)

Step 3 — Resource Classification

The skill computes the Joshi class:

Resource class (Joshi): 2 — "Hopefuls"
  Wikipedia: exists (~3,700 articles)
  OPUS: present (Bible, small parallel)
  FLORES-200: YES (included)
  NLLB: YES (included)
  HuggingFace datasets: 8 (growing)
  
  Interpretation: Some labeled data; no comprehensive benchmark coverage;
  active research community. Recommended strategy: vocab extension + LoRA.

Note: Twi is a tone language — this classification has immediate implications for how data is handled.

Step 4 — Typological Profile

Typological outliers for Twi (twi):
  • Tone: HIGH/LOW/FALLING tones — diacritic preservation MANDATORY
  • SVO word order: standard
  • Agglutinative morphology (moderate): fertility ~2.1× expected
  • Serial verb constructions: complex predicate structures
  • No grammatical gender
  • Pro-drop: YES (subject pronouns often omitted)

Transfer source candidates:
  1. Yoruba (yor) — URIEL=0.22 — same branch (Kwa) + tone + Latin script + Class 2
  2. Igbo (ibo) — URIEL=0.29 — Niger-Congo + tone + Latin script + Class 1
  3. Akan-Fante (fat) — URIEL=0.11 — direct dialect relative (if data available)
  English distance: 0.65 — NOT recommended as primary source

The URIEL distances make a clear recommendation: use Yoruba or Akan-Fante as a transfer source, not English.

Step 5 — Script Policy

linguistic-scripts sets the policy:

Script Policy: Twi (twi)
- Primary script: Latin (with tone diacritics)
- Normalization: NFC
- Diacritics: PRESERVE (tone language — stripping = data corruption)
  Twi tone marks: à (low), a (mid), á (high)
  Example: "bà" (come) ≠ "ba" (child) ≠ "bá" (be healed) — all different words
- Romanization: N/A (already Latin-script)
- Confusable risk: LOW
- ZWJ/ZWNJ: NORMALIZE

This is written to workspace_state.md and will govern all subsequent data acquisition.

Step 6 — Ethics Seed

Ethics seed: Twi (twi)
- Vitality (EGIDS): 2 — Provincial (widely spoken in Ghana)
- Ethics depth: Standard FPIC + license check
- Community engagement: Standard (no mandatory pre-engagement)
- Sacred-text flag: None identified
- Action: Check Bible-NLP register % at corpus stage (tone language; archaic register risk)

Step 7 — Review the Profile

Check what was captured:

/linguistic:status

Expected output:

[Phase: Scope | Language: Twi (twi) | Resource Class: 2 | Last skill: linguistic-ethics | Open findings: 0]

For the full profile:

/linguistic:review

Step 8 — Check workspace_state.md

Open workspace_state.md in your editor. You should see:

## Targets
- Language: Twi (twi) | Glottolog: akan1250
- Resource class (Joshi 0-5): 2 — "Hopefuls"
- Vitality (EGIDS): 2 — Provincial

## Script Policy
- Primary script: Latin (with tone diacritics)
- Normalization: NFC
- Diacritics: PRESERVE

## Typological Profile
- Outliers: tone (lexical), SVO, agglutinative-moderate, serial verbs, pro-drop
- Transfer source: Yoruba (URIEL=0.22) or Fante (URIEL=0.11)

## Ethics Status
- Seed: COMPLETE (2026-05-22)
- Depth: Standard FPIC

## Open Questions
- Q1: Confirm dialect (Asante vs Akuapem vs Fante) — defaulted to Asante Twi

What You Learned

In this tutorial you:

  1. Disambiguated "Twi" from its Akan dialect family
  2. Learned it is Joshi Class 2 — "Hopefuls" — with specific strategy implications
  3. Identified tone as a critical feature requiring diacritic preservation
  4. Got evidence-based transfer source recommendations (Yoruba, not English)
  5. Set a script policy that will protect tone marks throughout the pipeline
  6. Completed an ethics seed appropriate for EGIDS 2

Next Steps

With the scope complete, you're ready for:

  • Build a Corpus tutorial — Acquire phase for Twi
  • /linguistic:propose — generate a full 5-phase plan for Twi
  • /linguistic:decide transfer source — confirm Yoruba vs Fante as transfer source
Was this page helpful?
Edit on GitHub

Last updated on

On this page