linguistic-orchestrator
Entry point for any linguistic / NLP / LLM-for-low-resource-language task. Coordinates the 5-phase pipeline and routes to the right specialist skill. Use this skill whenever a target language is mentioned in conjunction with any ML/NLP operation.
Overview
The orchestrator is the conductor — it reads workspace_state.md, identifies the current pipeline phase, and routes to the appropriate specialist skill(s). It never duplicates specialist content; it always hands off. Every session begins here, and the orchestrator resumes seamlessly from wherever the last session left off.
Pipeline Position
Phase: All phases (entry point and coordinator)
Activates: All other linguistic skills through routing
Entry point skill: Yes — this is where every linguistic pipeline begins
When It Activates
- User mentions a target language (especially non-English / low-resource) with any LLM/NLP task
- User needs the multi-step linguistic pipeline
- Session needs phase tracking, multi-skill coordination, or workspace state
- User is unsure which
linguistic-*specialist to use — this skill triages
Natural language triggers (identical to slash commands):
- "help me build an LLM for [language]"
- "my tokenizer produces garbage for [language]"
- "train a Cantonese model"
- "low-resource MT"
- "evaluate on FLORES / Belebele / AfroBench"
- "what data exists for [language]?"
- Any bare target language name + ML/NLP verb
When NOT to use: A single isolated operation where the specific skill handles it directly (e.g., "just compute fertility for this tokenizer" → linguistic-tokenize directly).
What It Does
On First Touch
- Check workspace — if no
workspace_state.mdexists, create one with: target language(s), Glottolog/ISO code, resource class, pipeline phase (start at Scope) - Check ethics gate early — before recommending any data sources, route to
linguistic-ethicsfor FPIC/CARE awareness - Identify phase — map the user's request to: Scope / Acquire / Analyze / Evaluate / Release
- Route to specialist(s) — never duplicate specialist content; always hand off
Phase Routing Table
| Phase | Specialists Routed |
|---|---|
| Scope | scope → scripts → ethics (seed) |
| Acquire | corpus + bitext + transfer + tokenize; ethics (per-dataset) |
| Analyze | morph, syntax, semantics, discourse, speech, annotate (as needed) |
| Evaluate | eval |
| Release | ethics (final gate) |
Phase Indicator
Every substantive response includes:
[Phase: Scope | Language: Yoruba (yor) | Resource Class: 2 | Skills routed: scope, ethics]Workspace State Management
The orchestrator reads and writes workspace_state.md in the current working directory. This file is the shared memory — scope writes language identity, scripts writes normalization policy, corpus writes the data manifest. The orchestrator snapshots state before every destructive update; use /linguistic:rollback to restore from snapshots in logs/.
Disambiguation Query
When a user query matches multiple specialists, the orchestrator decomposes and routes to ≥2 nearest skills with explicit "partial match" caveat. Queries in the linguistic domain that match no single skill are decomposed, not refused.
Pipeline Overview
Scope → Acquire → Analyze → Evaluate → Release
| | | | |
scope corpus morph eval ethics
scripts bitext syntax (release gate)
tokenize transfer semantics
ethics (ethics discourse
(early gate at speech
gate) each annotate
dataset)Phases overlap and loop back. The orchestrator provides the skeleton; specialists own the content.
Inputs & Outputs
| Input | Description |
|---|---|
| Any linguistic/NLP request | Natural language or slash command |
workspace_state.md | Prior session state (if exists) |
| Output | Description |
|---|---|
| Phase indicator | Current phase + language + resource class |
| Specialist routing | Which skill(s) activated and why |
workspace_state.md | Updated with new phase outputs |
| Open questions | Decisions pending user input |
Example Usage
Natural language: "help me build an LLM for Khmer"
[Phase: Scope | Language: Khmer (khm) | Resource Class: 2 | Skills routed: scope]
Routing to linguistic-scope for:
1. ISO 639-3 + Glottolog resolution (khm / khmr1253)
2. Joshi resource classification
3. Typological profile (Austroasiatic; abugida script; analytic morphology)
4. Transfer source recommendations (Vietnamese distance 0.31)
5. Vitality assessment (EGIDS 1 — national language; standard FPIC)
Next: linguistic-scripts for Khmer abugida normalization policy
linguistic-ethics seed (EGIDS 1; standard gate)Related Skills
All linguistic skills — this is the coordinator for the entire suite.
linguistic-scope— always the first specialist routedlinguistic-ethics— always routed early and at Releaselinguistic-eval— always routed at Evaluate phase
Last updated on
Skills Reference
All 18 linguistic agent skills organized by pipeline phase — from language scoping through evaluation and release.
linguistic-scope
Identify a target language precisely and set the strategic direction for any LLM/NLP project. Handles ISO 639-3 resolution, Joshi resource classification, typological profiling, and transfer-source selection.