Quick Start
This walkthrough takes you through your first language analysis using the Linguistic Agent Skills suite. You'll scope a low-resource language, get its typological profile, and establish a script policy — the foundation for any LLM project.
Prerequisites
Complete Installation first. You need Claude Code running with the skills and commands linked.
Step 1 — Start a Pipeline Session
Open a Claude Code session in your project directory and type:
help me build an LLM for YorubaThe linguistic-orchestrator activates automatically. It creates workspace_state.md in your current directory and routes to linguistic-scope.
Alternatively, use the explicit slash command:
/linguistic:lifecycleStep 2 — Language Disambiguation
linguistic-scope immediately runs disambiguation. Yoruba (yor) is an unambiguous ISO 639-3 code, so it proceeds directly. If you had said "Chinese" or "Arabic", it would pause and present subtags as numbered options — macrolanguage disambiguation is mandatory before any data work begins.
Expected output:
ISO 639-3: yor
Glottolog: yoru1245
Family: Niger-Congo > Atlantic-Congo > Volta-Congo > Benue-Congo > Yoruboid
Default script: Latin (with tonal diacritics)Step 3 — Resource Classification
The skill computes the Joshi 0–5 resource class from cached signals: Wikipedia presence, OPUS presence, FLORES-200 inclusion, NLLB inclusion, and HuggingFace dataset count.
Expected output for Yoruba:
Resource class (Joshi): 2 — "Hopefuls"
Some labelled data, no full benchmark coverage.
Recommended strategy: Vocab extension + LoRA.
Transfer source candidates: Igbo (0.18), Hausa (0.34), Swahili (0.41)
English distance: 0.62 — NOT recommended as primary transfer source.This class changes every downstream decision — tokenizer strategy, eval suite, adapter rank, ethics depth. The orchestrator records it in workspace_state.md.
Step 4 — Typological Profile
The skill surfaces outlier features from the URIEL vector that require targeted handling:
Typological outliers for Yoruba (yor):
• Tone: HIGH/LOW tone (á/à), nasal (ọ̀) — diacritic preservation MANDATORY
• SVO word order: standard
• Isolating morphology: low fertility expected (~1.8-2.2×)
• No grammatical gender
• Subject-verb agreement: limitedStep 5 — Script Policy (linguistic-scripts)
The orchestrator routes to linguistic-scripts to set normalization policy:
Script Policy for Yoruba:
Primary script: Latin (Unicode block U+0000-U+007F + combining diacritics)
Normalization: NFC (default)
Diacritics: PRESERVE — tone language (stripping = data corruption)
Romanization: N/A (already Latin-script)
Confusable risk: LOW
ZWJ/ZWNJ: NORMALIZEThis policy is recorded in workspace_state.md and applies to every downstream step.
Step 6 — Ethics Seed (linguistic-ethics)
Before any data recommendation, linguistic-ethics runs an early gate:
Ethics seed for Yoruba:
Vitality (EGIDS): 2 — Provincial (widely spoken in Nigeria)
Ethics depth: Standard FPIC + license check
Community engagement: Standard (no mandatory pre-engagement for this vitality level)
Sacred-text flag: None identifiedStep 7 — Review the Workspace State
At any point, check where you are:
/linguistic:statusOutput:
[Phase: Scope | Language: Yoruba (yor) | Resource Class: 2 | Skills routed: scope, scripts, ethics | Open findings: 1]For a full review:
/linguistic:reviewStep 8 — Next Steps
With Scope complete, you're ready for the Acquire phase. The orchestrator will suggest:
/linguistic:proposeThis generates a full plan.md covering all 5 phases — corpus sources (with ethics flags), tokenizer strategy, adapter plan, eval suite, and release gating — all tailored to Yoruba Class 2.
What Just Happened
In these 8 steps, the suite automatically:
- Resolved the language to a canonical ISO 639-3 + Glottolog identifier
- Classified it as Joshi Class 2 with typology-informed strategy
- Identified that English is a poor transfer source (URIEL distance 0.62) and recommended Igbo as the primary
- Set a script policy that protects tone diacritics from being stripped
- Ran an ethics seed that gates all future data recommendations
- Wrote structured state to
workspace_state.mdfor cross-session continuity
All of this from a single natural-language prompt — "help me build an LLM for Yoruba".
Last updated on