MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Skills

linguistic-syntax

Universal Dependencies (UD) treebank usage, cross-lingual parser transfer, and agreement-probe construction for evaluating whether a low-resource LLM has actually learned grammar — not just lexical surface.

Overview

Parser F1 on UD treebanks measures parsing ability. Agreement probes measure whether an LLM has internalized grammatical knowledge. For evaluating low-resource LLMs, agreement probes are the right tool — they work without a parser, they target specific phenomena, and they detect failures that surface fluency misses entirely. linguistic-syntax provides both: treebank selection for parser work and probe construction for LLM eval.

Pipeline Position

Phase: Analyze (Phase 2)

Before this skill: linguistic-scope (URIEL distance for parser transfer source), linguistic-morph (morph-aware tokenization for UD annotation)

After this skill: linguistic-eval (agreement probes become eval suite items), linguistic-annotate (if creating new UD gold data)

When It Activates

  • Selecting a UD treebank for the target language
  • Cross-lingual parser transfer (no labeled treebank for target)
  • Building grammatical-knowledge probes (agreement, case, word order)
  • Evaluating whether an LLM has learned target-language syntax
  • Annotating new UD data

When NOT to use: The target has rich UD coverage and standard parser fine-tune works — use the parser directly. For pure annotation methodology → linguistic-annotate.

What It Does

Treebank Availability

100+ UD treebanks exist but vary 100× in size. Key distinction:

  • Training-size treebanks (10K+ sentences): can train a parser
  • PUD-style 1K test treebanks: eval-only — never fine-tune a parser on these (leaked eval, a common published-results error)

Approach by Availability

SituationApproach
Native UD treebank, training-size (10K+)Fine-tune parser directly
Only PUD test treebankCross-lingual transfer; PUD as eval-only
No UD treebankCross-lingual transfer (zero-shot); flag eval limitation

Parser Tool Selection

ToolStrengthsPick When
Trankit (2021)Best low-resource UD quality; XLM-R basedQuality priority + low-resource
stanza (Stanford 2020)70+ languages; fastSpeed; broad coverage
UDify (2019)Single multilingual modelLegacy/quick baseline
spaCyFast; production-readyEnglish/major-language production

Trankit is generally better than stanza for low-resource. UDify is older — prefer Trankit for new low-resource projects.

Cross-Lingual Transfer Source

Pick by URIEL typological distance (from linguistic-scope), NOT treebank size. Closest typological neighbor with adequate data > distant neighbor with massive data.

Agreement-Probe Construction

For each grammatical phenomenon, build minimal-pair probes and compute model log-likelihood ratio:

PhenomenonExampleTarget Language
Subject-verb agreement"la luna brilla" / "la luna brillan"Spanish
Gender agreement"el libro" / "la libro"Spanish
Case markinginstrumental vs nominativeRussian
Tone-marked agreementpreserve diacritics in probeYoruba

Target: 100–500 minimal pairs per phenomenon. Log-likelihood > 0 = correct preference.

Never report parser F1 as a proxy for LLM grammatical knowledge. LLMs don't expose dependency parses — use agreement probes.

Inputs & Outputs

InputDescription
Target language ISO codeFor treebank lookup
URIEL distance from scopeFor transfer source selection
OutputDescription
UD treebank availabilityTraining-size / PUD-only / none
Recommended training treebankName + size + register
Cross-lingual transfer sourceISO + rationale
Parser toolTrankit / stanza / UDify
Agreement-probe specPhenomena + pair count
workspace_state.md entrySyntax plan

Example Usage

Language: Yoruba (yor) — no training-size UD treebank available

Syntax Analysis: Yoruba
- UD treebank: YTB (Yoruba TreeBank) — PUD-style, eval-only
- Approach: cross-lingual transfer (zero-shot)
- Transfer source: Igbo (URIEL=0.18) — closest with usable treebank
- Parser tool: Trankit (XLM-R base; best low-resource)
- Agreement-probe phenomena:
    • Subject-verb agreement (limited in Yoruba; SVO)
    • Tone preservation in minimal pairs (MANDATORY — lexical tone)
- Probe size: 150 pairs per phenomenon
- Eval limitation: no native training treebank; cross-lingual transfer only
Was this page helpful?
Edit on GitHub

Last updated on

On this page