Skills Reference

The Linguistic Agent Skills suite contains 18 skills organized across a 5-phase pipeline. The linguistic-orchestrator coordinates routing; 14 specialist skills own phase-specific content; 3 optional Mindset stubs cover Phase 4 decisions.

Pipeline Overview

Phase 0: Scope    → scope, scripts, tokenize, ethics
Phase 1: Acquire  → corpus, bitext, transfer
Phase 2: Analyze  → morph, syntax, annotate, semantics, discourse, speech
Phase 3: Evaluate → eval
Phase 4: Optional → codeswitch, historical, lexicon
         + orchestrator (coordinates all phases)

Skills by Phase

Phase 0 — Scope

Identify the target language precisely and set strategic direction before touching any data.

Skill	Score	Purpose
`linguistic-scope`	A− (105)	ISO 639-3 + Glottolog resolution, Joshi classification, URIEL typological profiling, transfer-source selection
`linguistic-scripts`	A− (104)	Unicode normalization policy (NFC/NFKC), confusable folding, diacritic preservation for tone languages
`linguistic-tokenize`	A− (104)	Fertility audit, SentencePiece config, vocab-extension method (FOCUS/OFA/HyperOfa)
`linguistic-ethics`	A− (106)	CARE/FPIC, license audit, sacred-text gating, attribution registry — runs at Scope AND Release

Phase 1 — Acquire

Gather monolingual and parallel data ethically and reproducibly.

Skill	Score	Purpose
`linguistic-corpus`	A− (103)	Catalog (OLDI/CulturaX/MADLAD-400/Glot500), paragraph LID, MinHash dedup, contamination audit
`linguistic-bitext`	A− (102)	LASER3/SONAR mining, Vecalign alignment, margin threshold tuning, synthetic bitext
`linguistic-transfer`	A− (105)	LoRA rank by URIEL distance, MAD-X adapters, forgetting mitigation, tool selection

Phase 2 — Analyze

Run linguistic analysis layers needed for evaluation, augmentation, or training.

Skill	Score	Purpose
`linguistic-morph`	A− (102)	UniMorph, SIGMORPHON segmenters, FST/HFST, paradigm-completion augmentation
`linguistic-syntax`	A− (102)	UD treebanks, cross-lingual parser transfer, agreement-probe construction
`linguistic-annotate`	A− (103)	IAA metric selection (κ/α/γ), guideline authoring, adjudication, active learning
`linguistic-semantics`	A− (102)	WordNet/OMW coverage, FrameNet/PropBank SRL, MWE/PARSEME, semantic-equivalence eval
`linguistic-discourse`	A− (102)	RST/PDTB/GUM frameworks, coreference (zero-anaphora for pro-drop), coherence eval
`linguistic-speech`	A− (101)	ELAN/Praat/FLEx → Lhotse CutSet, G2P/IPA, MMS/Whisper ASR, VITS TTS

Phase 3 — Evaluate

Honestly measure performance with metrics fit for the target language.

Skill	Score	Purpose
`linguistic-eval`	A− (104)	chrF++/COMET/GEMBA-MQM, BLiMP-style probes, contamination-aware reporting, per-dialect breakdowns

Orchestrator

Skill	Score	Purpose
`linguistic-orchestrator`	A− (102)	Entry point; phase routing; workspace state management

Phase 4 — Optional (Mindset Stubs)

Activate when the specific scenario applies.

Skill	Score	Purpose
`linguistic-codeswitch`	B+ (97)	Code-switching awareness for Hinglish/Spanglish/Singlish/MSA+dialect communities
`linguistic-historical`	B+ (97)	Cognate sets, Swadesh lists, sound correspondences for Class 0–1 bootstrap
`linguistic-lexicon`	B+ (98)	Dictionary-building, sense splitting/lumping, MWE inventories for RAG/MT post-edit

Quality Scores

All scores from skill-judge (8-dimension, 120-point rubric), snapshot 2026-04-23. Entry-point skills required A− (≥102/120); specialist skills required A−; Mindset stubs required B+ (≥96/120).

Shared Utilities

The _linguistic_shared/ directory contains interaction_utils.py and findings_presenter.py — shared utilities used across all skills. See Shared Utilities for details.

Was this page helpful?

On this page