linguistic-syntax

Universal Dependencies (UD) treebank usage, cross-lingual parser transfer, and agreement-probe construction for evaluating whether a low-resource LLM has actually learned grammar — not just lexical surface.

Overview

Parser F1 on UD treebanks measures parsing ability. Agreement probes measure whether an LLM has internalized grammatical knowledge. For evaluating low-resource LLMs, agreement probes are the right tool — they work without a parser, they target specific phenomena, and they detect failures that surface fluency misses entirely. linguistic-syntax provides both: treebank selection for parser work and probe construction for LLM eval.

Pipeline Position

Phase: Analyze (Phase 2)

Before this skill: linguistic-scope (URIEL distance for parser transfer source), linguistic-morph (morph-aware tokenization for UD annotation)

After this skill: linguistic-eval (agreement probes become eval suite items), linguistic-annotate (if creating new UD gold data)

When It Activates

Selecting a UD treebank for the target language
Cross-lingual parser transfer (no labeled treebank for target)
Building grammatical-knowledge probes (agreement, case, word order)
Evaluating whether an LLM has learned target-language syntax
Annotating new UD data

When NOT to use: The target has rich UD coverage and standard parser fine-tune works — use the parser directly. For pure annotation methodology → linguistic-annotate.

What It Does

Treebank Availability

100+ UD treebanks exist but vary 100× in size. Key distinction:

Training-size treebanks (10K+ sentences): can train a parser
PUD-style 1K test treebanks: eval-only — never fine-tune a parser on these (leaked eval, a common published-results error)

Approach by Availability

Situation	Approach
Native UD treebank, training-size (10K+)	Fine-tune parser directly
Only PUD test treebank	Cross-lingual transfer; PUD as eval-only
No UD treebank	Cross-lingual transfer (zero-shot); flag eval limitation

Parser Tool Selection

Tool	Strengths	Pick When
Trankit (2021)	Best low-resource UD quality; XLM-R based	Quality priority + low-resource
stanza (Stanford 2020)	70+ languages; fast	Speed; broad coverage
UDify (2019)	Single multilingual model	Legacy/quick baseline
spaCy	Fast; production-ready	English/major-language production

Trankit is generally better than stanza for low-resource. UDify is older — prefer Trankit for new low-resource projects.

Cross-Lingual Transfer Source

Pick by URIEL typological distance (from linguistic-scope), NOT treebank size. Closest typological neighbor with adequate data > distant neighbor with massive data.

Agreement-Probe Construction

For each grammatical phenomenon, build minimal-pair probes and compute model log-likelihood ratio:

Phenomenon	Example	Target Language
Subject-verb agreement	"la luna brilla" / "la luna brillan"	Spanish
Gender agreement	"el libro" / "la libro"	Spanish
Case marking	instrumental vs nominative	Russian
Tone-marked agreement	preserve diacritics in probe	Yoruba

Target: 100–500 minimal pairs per phenomenon. Log-likelihood > 0 = correct preference.

Never report parser F1 as a proxy for LLM grammatical knowledge. LLMs don't expose dependency parses — use agreement probes.

Inputs & Outputs

Input	Description
Target language ISO code	For treebank lookup
URIEL distance from scope	For transfer source selection

Output	Description
UD treebank availability	Training-size / PUD-only / none
Recommended training treebank	Name + size + register
Cross-lingual transfer source	ISO + rationale
Parser tool	Trankit / stanza / UDify
Agreement-probe spec	Phenomena + pair count
`workspace_state.md` entry	Syntax plan

Example Usage

Language: Yoruba (yor) — no training-size UD treebank available

Syntax Analysis: Yoruba
- UD treebank: YTB (Yoruba TreeBank) — PUD-style, eval-only
- Approach: cross-lingual transfer (zero-shot)
- Transfer source: Igbo (URIEL=0.18) — closest with usable treebank
- Parser tool: Trankit (XLM-R base; best low-resource)
- Agreement-probe phenomena:
    • Subject-verb agreement (limited in Yoruba; SVO)
    • Tone preservation in minimal pairs (MANDATORY — lexical tone)
- Probe size: 150 pairs per phenomenon
- Eval limitation: no native training treebank; cross-lingual transfer only

linguistic-scope — URIEL distance for parser transfer source
linguistic-morph — morph-aware tokenization used in UD annotation
linguistic-annotate — annotation methodology for creating new UD gold
linguistic-eval — agreement probes become eval suite items

Was this page helpful?

On this page