Tutorial: Explore a Language Profile
Difficulty: Easy
Time: 20–30 minutes
Skills used: linguistic-scope, linguistic-scripts
Language: Twi (twi) — Akan dialect, Ghana, Joshi Class 2
This tutorial walks through the Scope phase for a new language target. By the end you'll have a complete language profile in workspace_state.md and understand what it means for your downstream decisions.
Prerequisites
- Linguistic Agent Skills installed (see Installation)
- A Claude Code session open in a new project directory
Step 1 — Trigger the Scope Phase
In your Claude Code session, type:
help me understand what I need to know about Twi for NLPThe linguistic-orchestrator activates. It creates workspace_state.md and routes to linguistic-scope.
You can also use the explicit command:
/linguistic:lifecycleThen enter "Twi" when prompted for a language.
Step 2 — Language Disambiguation
linguistic-scope first checks whether "Twi" is ambiguous. Twi is part of the Akan dialect continuum (with Fante and Asante as the main varieties). The skill surfaces this:
"Twi" may refer to:
1. Asante Twi (ISO: twi) — most common; Asante region, Ghana
2. Akuapem Twi (ISO: twi, dialect) — Akuapem region, Ghana
3. Fante (ISO: fat) — coastal Ghana; related but distinct
Proceeding with Asante Twi (twi) as default. Confirm or specify a variant.Confirm option 1 for this tutorial.
Expected output:
ISO 639-3: twi
Glottolog: akan1250 (Akan)
Family: Niger-Congo > Atlantic-Congo > Kwa > Akan
Default script: Latin (with tone diacritics)Step 3 — Resource Classification
The skill computes the Joshi class:
Resource class (Joshi): 2 — "Hopefuls"
Wikipedia: exists (~3,700 articles)
OPUS: present (Bible, small parallel)
FLORES-200: YES (included)
NLLB: YES (included)
HuggingFace datasets: 8 (growing)
Interpretation: Some labeled data; no comprehensive benchmark coverage;
active research community. Recommended strategy: vocab extension + LoRA.Note: Twi is a tone language — this classification has immediate implications for how data is handled.
Step 4 — Typological Profile
Typological outliers for Twi (twi):
• Tone: HIGH/LOW/FALLING tones — diacritic preservation MANDATORY
• SVO word order: standard
• Agglutinative morphology (moderate): fertility ~2.1× expected
• Serial verb constructions: complex predicate structures
• No grammatical gender
• Pro-drop: YES (subject pronouns often omitted)
Transfer source candidates:
1. Yoruba (yor) — URIEL=0.22 — same branch (Kwa) + tone + Latin script + Class 2
2. Igbo (ibo) — URIEL=0.29 — Niger-Congo + tone + Latin script + Class 1
3. Akan-Fante (fat) — URIEL=0.11 — direct dialect relative (if data available)
English distance: 0.65 — NOT recommended as primary sourceThe URIEL distances make a clear recommendation: use Yoruba or Akan-Fante as a transfer source, not English.
Step 5 — Script Policy
linguistic-scripts sets the policy:
Script Policy: Twi (twi)
- Primary script: Latin (with tone diacritics)
- Normalization: NFC
- Diacritics: PRESERVE (tone language — stripping = data corruption)
Twi tone marks: à (low), a (mid), á (high)
Example: "bà" (come) ≠ "ba" (child) ≠ "bá" (be healed) — all different words
- Romanization: N/A (already Latin-script)
- Confusable risk: LOW
- ZWJ/ZWNJ: NORMALIZEThis is written to workspace_state.md and will govern all subsequent data acquisition.
Step 6 — Ethics Seed
Ethics seed: Twi (twi)
- Vitality (EGIDS): 2 — Provincial (widely spoken in Ghana)
- Ethics depth: Standard FPIC + license check
- Community engagement: Standard (no mandatory pre-engagement)
- Sacred-text flag: None identified
- Action: Check Bible-NLP register % at corpus stage (tone language; archaic register risk)Step 7 — Review the Profile
Check what was captured:
/linguistic:statusExpected output:
[Phase: Scope | Language: Twi (twi) | Resource Class: 2 | Last skill: linguistic-ethics | Open findings: 0]For the full profile:
/linguistic:reviewStep 8 — Check workspace_state.md
Open workspace_state.md in your editor. You should see:
## Targets
- Language: Twi (twi) | Glottolog: akan1250
- Resource class (Joshi 0-5): 2 — "Hopefuls"
- Vitality (EGIDS): 2 — Provincial
## Script Policy
- Primary script: Latin (with tone diacritics)
- Normalization: NFC
- Diacritics: PRESERVE
## Typological Profile
- Outliers: tone (lexical), SVO, agglutinative-moderate, serial verbs, pro-drop
- Transfer source: Yoruba (URIEL=0.22) or Fante (URIEL=0.11)
## Ethics Status
- Seed: COMPLETE (2026-05-22)
- Depth: Standard FPIC
## Open Questions
- Q1: Confirm dialect (Asante vs Akuapem vs Fante) — defaulted to Asante TwiWhat You Learned
In this tutorial you:
- Disambiguated "Twi" from its Akan dialect family
- Learned it is Joshi Class 2 — "Hopefuls" — with specific strategy implications
- Identified tone as a critical feature requiring diacritic preservation
- Got evidence-based transfer source recommendations (Yoruba, not English)
- Set a script policy that will protect tone marks throughout the pipeline
- Completed an ethics seed appropriate for EGIDS 2
Next Steps
With the scope complete, you're ready for:
- Build a Corpus tutorial — Acquire phase for Twi
/linguistic:propose— generate a full 5-phase plan for Twi/linguistic:decide transfer source— confirm Yoruba vs Fante as transfer source
Last updated on
Tutorials
Step-by-step tutorials for using the Linguistic Agent Skills suite, from exploring a language profile to running a full end-to-end pipeline.
Tutorial: Build a Corpus
Corpus acquisition, deduplication, contamination audit, and register balance analysis for Twi — a complete Acquire phase walkthrough using linguistic-corpus, linguistic-bitext, and linguistic-ethics.