linguistic-discourse

Discourse-level analysis for the target language: framework selection (RST/PDTB/GUM/SDRT), coreference including zero-anaphora in pro-drop languages, discourse markers, and coherence-aware evaluation for long-context LLMs.

Overview

Discourse is the layer most LLM evals don't touch — and where modern LLMs most often quietly fail. A model can fluently produce a 2,000-word answer with a hallucinated citation, an unreachable referent, or topic drift that goes undetected by perplexity metrics. linguistic-discourse provides the analytical lens and tooling to catch these failures before they reach production.

Pipeline Position

Phase: Analyze (Phase 2)

Before this skill: linguistic-syntax (coreference builds on syntactic structure), linguistic-semantics (sense-level meaning precedes discourse-level coherence)

After this skill: linguistic-eval (discourse-aware eval metrics), linguistic-annotate (discourse annotation projects)

When It Activates

Long-context LLM eval where coherence matters (summarization, multi-paragraph QA, RAG)
Coreference annotation or eval, especially for pro-drop languages
Choosing between discourse-annotation frameworks
Diagnosing model failures: hallucinated references, dangling pronouns, topic drift, broken citation

When NOT to use: Purely sentence-level eval → linguistic-eval. Syntactic structure → linguistic-syntax. Sense-level meaning → linguistic-semantics.

Framework Selection

Framework	Models	Best For
RST	Hierarchical nucleus/satellite tree	Summarization; discourse-aware compression
PDTB	Local discourse relations, explicit/implicit connectives	Discourse-marker prediction; QA connective analysis
GUM	RST + UD + coref + entities + discourse markers	Multi-layer cross-eval; single-source ground truth
SDRT	Formal logical structure	Research; rare in production

For most LLM eval projects: PDTB for connective-level prediction, RST for summarization coherence, GUM when multi-layer alignment is needed.

What It Does

Four Analytical Lenses

1. Local connectives (PDTB): Does the model handle "because" / "although" / "however" correctly? Tool: PDTB-trained classifier; extract connectives + arguments; check relation matches.

2. Hierarchical structure (RST): Does the summary preserve the central nucleus? Tool: RST parser (per-language coverage varies); compare nuclei across source and summary.

3. Coreference + anaphora: Do all pronouns resolve to a valid antecedent? Tool: coref resolver. For pro-drop languages: zero-anaphora extension required.

4. Topic continuity: Does generation stay on topic across paragraphs? Tool: topic-segment detection (TextTiling, BERT-based); compute topic-coherence across segments.

Zero Anaphora in Pro-Drop Languages

~20–40% of pronoun chains in Mandarin, Japanese, Spanish, Italian are dropped (omitted in surface form). English-trained coref models miss these silently. Pro-drop requires zero-anaphora extension — not just a standard coref resolver.

Coreference is Genre-Specific

OntoNotes-trained coref models (news/Wikipedia) break on dialogue. Use ConvCoref or genre-matched data for conversational text.

RAG Citation Faithfulness

Citation faithfulness is a discourse-coherence problem — not just citation-overlap. A model citing "X from source Y" is only valid if (a) claim X actually appears in source Y AND (b) claim X coreferentially resolves to what the user asked. Naive citation-overlap metrics miss the coreference half.

Inputs & Outputs

Input	Description
Target language + task type	For framework selection
Text samples	For coreference/discourse analysis

Output	Description
Framework recommendation	RST / PDTB / GUM / SDRT + rationale
Coreference approach	Resolver + zero-anaphora flag for pro-drop
Discourse-marker lexicon	Per-language classifier recommendation
Coherence eval metrics	Probe types + per-phenomenon
`workspace_state.md` entry	Discourse plan

Example Usage

Language: Mandarin (cmn), task: long-context QA evaluation

Discourse Analysis: Mandarin (cmn)
- Framework: PDTB (connective-level) + RST (summarization coherence)
- Coreference: zero-anaphora extension MANDATORY
    (Mandarin pro-drop rate ~35% of pronoun chains)
- Coref model: genre-matched (not OntoNotes); use Mandarin-specific resolver
- Discourse-marker lexicon: construct from PDTB-aligned Chinese corpus
- Topic continuity: TextTiling adapted for Chinese (no spaces; character-level)
- RAG faithfulness: PDTB connective check + zero-anaphora coreference check
- Eval: coherence probe set (150 discourse-relation minimal pairs)

linguistic-syntax — syntactic structure underlies coreference analysis
linguistic-semantics — sense-level meaning precedes discourse
linguistic-eval — discourse-aware eval metrics
linguistic-annotate — discourse annotation project methodology

Was this page helpful?

On this page