linguistic-orchestrator

Entry point for any linguistic / NLP / LLM-for-low-resource-language task. Coordinates the 5-phase pipeline and routes to the right specialist skill. Use this skill whenever a target language is mentioned in conjunction with any ML/NLP operation.

Overview

The orchestrator is the conductor — it reads workspace_state.md, identifies the current pipeline phase, and routes to the appropriate specialist skill(s). It never duplicates specialist content; it always hands off. Every session begins here, and the orchestrator resumes seamlessly from wherever the last session left off.

Pipeline Position

Phase: All phases (entry point and coordinator)

Activates: All other linguistic skills through routing

Entry point skill: Yes — this is where every linguistic pipeline begins

When It Activates

User mentions a target language (especially non-English / low-resource) with any LLM/NLP task
User needs the multi-step linguistic pipeline
Session needs phase tracking, multi-skill coordination, or workspace state
User is unsure which linguistic-* specialist to use — this skill triages

Natural language triggers (identical to slash commands):

"help me build an LLM for [language]"
"my tokenizer produces garbage for [language]"
"train a Cantonese model"
"low-resource MT"
"evaluate on FLORES / Belebele / AfroBench"
"what data exists for [language]?"
Any bare target language name + ML/NLP verb

When NOT to use: A single isolated operation where the specific skill handles it directly (e.g., "just compute fertility for this tokenizer" → linguistic-tokenize directly).

What It Does

On First Touch

Check workspace — if no workspace_state.md exists, create one with: target language(s), Glottolog/ISO code, resource class, pipeline phase (start at Scope)
Check ethics gate early — before recommending any data sources, route to linguistic-ethics for FPIC/CARE awareness
Identify phase — map the user's request to: Scope / Acquire / Analyze / Evaluate / Release
Route to specialist(s) — never duplicate specialist content; always hand off

Phase Routing Table

Phase	Specialists Routed
Scope	scope → scripts → ethics (seed)
Acquire	corpus + bitext + transfer + tokenize; ethics (per-dataset)
Analyze	morph, syntax, semantics, discourse, speech, annotate (as needed)
Evaluate	eval
Release	ethics (final gate)

Phase Indicator

Every substantive response includes:

[Phase: Scope | Language: Yoruba (yor) | Resource Class: 2 | Skills routed: scope, ethics]

Workspace State Management

The orchestrator reads and writes workspace_state.md in the current working directory. This file is the shared memory — scope writes language identity, scripts writes normalization policy, corpus writes the data manifest. The orchestrator snapshots state before every destructive update; use /linguistic:rollback to restore from snapshots in logs/.

Disambiguation Query

When a user query matches multiple specialists, the orchestrator decomposes and routes to ≥2 nearest skills with explicit "partial match" caveat. Queries in the linguistic domain that match no single skill are decomposed, not refused.

Pipeline Overview

Scope → Acquire → Analyze → Evaluate → Release
  |        |         |          |           |
scope    corpus    morph      eval       ethics
scripts  bitext    syntax              (release gate)
tokenize transfer  semantics
ethics   (ethics   discourse
(early   gate at   speech
gate)    each      annotate
         dataset)

Phases overlap and loop back. The orchestrator provides the skeleton; specialists own the content.

Inputs & Outputs

Input	Description
Any linguistic/NLP request	Natural language or slash command
`workspace_state.md`	Prior session state (if exists)

Output	Description
Phase indicator	Current phase + language + resource class
Specialist routing	Which skill(s) activated and why
`workspace_state.md`	Updated with new phase outputs
Open questions	Decisions pending user input

Example Usage

Natural language: "help me build an LLM for Khmer"

[Phase: Scope | Language: Khmer (khm) | Resource Class: 2 | Skills routed: scope]

Routing to linguistic-scope for:
  1. ISO 639-3 + Glottolog resolution (khm / khmr1253)
  2. Joshi resource classification
  3. Typological profile (Austroasiatic; abugida script; analytic morphology)
  4. Transfer source recommendations (Vietnamese distance 0.31)
  5. Vitality assessment (EGIDS 1 — national language; standard FPIC)

Next: linguistic-scripts for Khmer abugida normalization policy
      linguistic-ethics seed (EGIDS 1; standard gate)

All linguistic skills — this is the coordinator for the entire suite.

linguistic-scope — always the first specialist routed
linguistic-ethics — always routed early and at Release
linguistic-eval — always routed at Evaluate phase

Was this page helpful?

On this page