linguistic-discourse
Discourse-level analysis for the target language: framework selection (RST/PDTB/GUM/SDRT), coreference including zero-anaphora in pro-drop languages, discourse markers, and coherence-aware evaluation for long-context LLMs.
Overview
Discourse is the layer most LLM evals don't touch — and where modern LLMs most often quietly fail. A model can fluently produce a 2,000-word answer with a hallucinated citation, an unreachable referent, or topic drift that goes undetected by perplexity metrics. linguistic-discourse provides the analytical lens and tooling to catch these failures before they reach production.
Pipeline Position
Phase: Analyze (Phase 2)
Before this skill: linguistic-syntax (coreference builds on syntactic structure), linguistic-semantics (sense-level meaning precedes discourse-level coherence)
After this skill: linguistic-eval (discourse-aware eval metrics), linguistic-annotate (discourse annotation projects)
When It Activates
- Long-context LLM eval where coherence matters (summarization, multi-paragraph QA, RAG)
- Coreference annotation or eval, especially for pro-drop languages
- Choosing between discourse-annotation frameworks
- Diagnosing model failures: hallucinated references, dangling pronouns, topic drift, broken citation
When NOT to use: Purely sentence-level eval → linguistic-eval. Syntactic structure → linguistic-syntax. Sense-level meaning → linguistic-semantics.
Framework Selection
| Framework | Models | Best For |
|---|---|---|
| RST | Hierarchical nucleus/satellite tree | Summarization; discourse-aware compression |
| PDTB | Local discourse relations, explicit/implicit connectives | Discourse-marker prediction; QA connective analysis |
| GUM | RST + UD + coref + entities + discourse markers | Multi-layer cross-eval; single-source ground truth |
| SDRT | Formal logical structure | Research; rare in production |
For most LLM eval projects: PDTB for connective-level prediction, RST for summarization coherence, GUM when multi-layer alignment is needed.
What It Does
Four Analytical Lenses
1. Local connectives (PDTB): Does the model handle "because" / "although" / "however" correctly? Tool: PDTB-trained classifier; extract connectives + arguments; check relation matches.
2. Hierarchical structure (RST): Does the summary preserve the central nucleus? Tool: RST parser (per-language coverage varies); compare nuclei across source and summary.
3. Coreference + anaphora: Do all pronouns resolve to a valid antecedent? Tool: coref resolver. For pro-drop languages: zero-anaphora extension required.
4. Topic continuity: Does generation stay on topic across paragraphs? Tool: topic-segment detection (TextTiling, BERT-based); compute topic-coherence across segments.
Zero Anaphora in Pro-Drop Languages
~20–40% of pronoun chains in Mandarin, Japanese, Spanish, Italian are dropped (omitted in surface form). English-trained coref models miss these silently. Pro-drop requires zero-anaphora extension — not just a standard coref resolver.
Coreference is Genre-Specific
OntoNotes-trained coref models (news/Wikipedia) break on dialogue. Use ConvCoref or genre-matched data for conversational text.
RAG Citation Faithfulness
Citation faithfulness is a discourse-coherence problem — not just citation-overlap. A model citing "X from source Y" is only valid if (a) claim X actually appears in source Y AND (b) claim X coreferentially resolves to what the user asked. Naive citation-overlap metrics miss the coreference half.
Inputs & Outputs
| Input | Description |
|---|---|
| Target language + task type | For framework selection |
| Text samples | For coreference/discourse analysis |
| Output | Description |
|---|---|
| Framework recommendation | RST / PDTB / GUM / SDRT + rationale |
| Coreference approach | Resolver + zero-anaphora flag for pro-drop |
| Discourse-marker lexicon | Per-language classifier recommendation |
| Coherence eval metrics | Probe types + per-phenomenon |
workspace_state.md entry | Discourse plan |
Example Usage
Language: Mandarin (cmn), task: long-context QA evaluation
Discourse Analysis: Mandarin (cmn)
- Framework: PDTB (connective-level) + RST (summarization coherence)
- Coreference: zero-anaphora extension MANDATORY
(Mandarin pro-drop rate ~35% of pronoun chains)
- Coref model: genre-matched (not OntoNotes); use Mandarin-specific resolver
- Discourse-marker lexicon: construct from PDTB-aligned Chinese corpus
- Topic continuity: TextTiling adapted for Chinese (no spaces; character-level)
- RAG faithfulness: PDTB connective check + zero-anaphora coreference check
- Eval: coherence probe set (150 discourse-relation minimal pairs)Related Skills
linguistic-syntax— syntactic structure underlies coreference analysislinguistic-semantics— sense-level meaning precedes discourselinguistic-eval— discourse-aware eval metricslinguistic-annotate— discourse annotation project methodology
Last updated on
linguistic-semantics
Lexical and frame semantics for the target language — WordNet/OMW coverage, FrameNet/PropBank SRL guidance, multi-word expressions (MWE/PARSEME), and semantic-equivalence eval for cross-lingual retrieval.
linguistic-speech
Bridge field-linguistics annotation (ELAN/Praat/FLEx/SayMore) and audio data into ML pipelines (Lhotse/ESPnet/k2/MMS/Whisper). G2P/IPA workflows, low-resource ASR/TTS recipe selection.