linguistic-syntax
Universal Dependencies (UD) treebank usage, cross-lingual parser transfer, and agreement-probe construction for evaluating whether a low-resource LLM has actually learned grammar — not just lexical surface.
Overview
Parser F1 on UD treebanks measures parsing ability. Agreement probes measure whether an LLM has internalized grammatical knowledge. For evaluating low-resource LLMs, agreement probes are the right tool — they work without a parser, they target specific phenomena, and they detect failures that surface fluency misses entirely. linguistic-syntax provides both: treebank selection for parser work and probe construction for LLM eval.
Pipeline Position
Phase: Analyze (Phase 2)
Before this skill: linguistic-scope (URIEL distance for parser transfer source), linguistic-morph (morph-aware tokenization for UD annotation)
After this skill: linguistic-eval (agreement probes become eval suite items), linguistic-annotate (if creating new UD gold data)
When It Activates
- Selecting a UD treebank for the target language
- Cross-lingual parser transfer (no labeled treebank for target)
- Building grammatical-knowledge probes (agreement, case, word order)
- Evaluating whether an LLM has learned target-language syntax
- Annotating new UD data
When NOT to use: The target has rich UD coverage and standard parser fine-tune works — use the parser directly. For pure annotation methodology → linguistic-annotate.
What It Does
Treebank Availability
100+ UD treebanks exist but vary 100× in size. Key distinction:
- Training-size treebanks (10K+ sentences): can train a parser
- PUD-style 1K test treebanks: eval-only — never fine-tune a parser on these (leaked eval, a common published-results error)
Approach by Availability
| Situation | Approach |
|---|---|
| Native UD treebank, training-size (10K+) | Fine-tune parser directly |
| Only PUD test treebank | Cross-lingual transfer; PUD as eval-only |
| No UD treebank | Cross-lingual transfer (zero-shot); flag eval limitation |
Parser Tool Selection
| Tool | Strengths | Pick When |
|---|---|---|
| Trankit (2021) | Best low-resource UD quality; XLM-R based | Quality priority + low-resource |
| stanza (Stanford 2020) | 70+ languages; fast | Speed; broad coverage |
| UDify (2019) | Single multilingual model | Legacy/quick baseline |
| spaCy | Fast; production-ready | English/major-language production |
Trankit is generally better than stanza for low-resource. UDify is older — prefer Trankit for new low-resource projects.
Cross-Lingual Transfer Source
Pick by URIEL typological distance (from linguistic-scope), NOT treebank size. Closest typological neighbor with adequate data > distant neighbor with massive data.
Agreement-Probe Construction
For each grammatical phenomenon, build minimal-pair probes and compute model log-likelihood ratio:
| Phenomenon | Example | Target Language |
|---|---|---|
| Subject-verb agreement | "la luna brilla" / "la luna brillan" | Spanish |
| Gender agreement | "el libro" / "la libro" | Spanish |
| Case marking | instrumental vs nominative | Russian |
| Tone-marked agreement | preserve diacritics in probe | Yoruba |
Target: 100–500 minimal pairs per phenomenon. Log-likelihood > 0 = correct preference.
Never report parser F1 as a proxy for LLM grammatical knowledge. LLMs don't expose dependency parses — use agreement probes.
Inputs & Outputs
| Input | Description |
|---|---|
| Target language ISO code | For treebank lookup |
| URIEL distance from scope | For transfer source selection |
| Output | Description |
|---|---|
| UD treebank availability | Training-size / PUD-only / none |
| Recommended training treebank | Name + size + register |
| Cross-lingual transfer source | ISO + rationale |
| Parser tool | Trankit / stanza / UDify |
| Agreement-probe spec | Phenomena + pair count |
workspace_state.md entry | Syntax plan |
Example Usage
Language: Yoruba (yor) — no training-size UD treebank available
Syntax Analysis: Yoruba
- UD treebank: YTB (Yoruba TreeBank) — PUD-style, eval-only
- Approach: cross-lingual transfer (zero-shot)
- Transfer source: Igbo (URIEL=0.18) — closest with usable treebank
- Parser tool: Trankit (XLM-R base; best low-resource)
- Agreement-probe phenomena:
• Subject-verb agreement (limited in Yoruba; SVO)
• Tone preservation in minimal pairs (MANDATORY — lexical tone)
- Probe size: 150 pairs per phenomenon
- Eval limitation: no native training treebank; cross-lingual transfer onlyRelated Skills
linguistic-scope— URIEL distance for parser transfer sourcelinguistic-morph— morph-aware tokenization used in UD annotationlinguistic-annotate— annotation methodology for creating new UD goldlinguistic-eval— agreement probes become eval suite items
Last updated on
linguistic-morph
Morphological analysis for the target language — UniMorph paradigm lookup, SIGMORPHON segmenters, FST/HFST analyzer recommendations, morphology-aware data augmentation. Essential for agglutinative, polysynthetic, and templatic languages.
linguistic-annotate
Design, run, and audit annotation projects — guideline authoring, IAA metric selection (Cohen κ/Fleiss κ/Krippendorff α/γ), adjudication workflow, active learning for sample selection.