MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Skills

linguistic-discourse

Discourse-level analysis for the target language: framework selection (RST/PDTB/GUM/SDRT), coreference including zero-anaphora in pro-drop languages, discourse markers, and coherence-aware evaluation for long-context LLMs.

Overview

Discourse is the layer most LLM evals don't touch — and where modern LLMs most often quietly fail. A model can fluently produce a 2,000-word answer with a hallucinated citation, an unreachable referent, or topic drift that goes undetected by perplexity metrics. linguistic-discourse provides the analytical lens and tooling to catch these failures before they reach production.

Pipeline Position

Phase: Analyze (Phase 2)

Before this skill: linguistic-syntax (coreference builds on syntactic structure), linguistic-semantics (sense-level meaning precedes discourse-level coherence)

After this skill: linguistic-eval (discourse-aware eval metrics), linguistic-annotate (discourse annotation projects)

When It Activates

  • Long-context LLM eval where coherence matters (summarization, multi-paragraph QA, RAG)
  • Coreference annotation or eval, especially for pro-drop languages
  • Choosing between discourse-annotation frameworks
  • Diagnosing model failures: hallucinated references, dangling pronouns, topic drift, broken citation

When NOT to use: Purely sentence-level eval → linguistic-eval. Syntactic structure → linguistic-syntax. Sense-level meaning → linguistic-semantics.

Framework Selection

FrameworkModelsBest For
RSTHierarchical nucleus/satellite treeSummarization; discourse-aware compression
PDTBLocal discourse relations, explicit/implicit connectivesDiscourse-marker prediction; QA connective analysis
GUMRST + UD + coref + entities + discourse markersMulti-layer cross-eval; single-source ground truth
SDRTFormal logical structureResearch; rare in production

For most LLM eval projects: PDTB for connective-level prediction, RST for summarization coherence, GUM when multi-layer alignment is needed.

What It Does

Four Analytical Lenses

1. Local connectives (PDTB): Does the model handle "because" / "although" / "however" correctly? Tool: PDTB-trained classifier; extract connectives + arguments; check relation matches.

2. Hierarchical structure (RST): Does the summary preserve the central nucleus? Tool: RST parser (per-language coverage varies); compare nuclei across source and summary.

3. Coreference + anaphora: Do all pronouns resolve to a valid antecedent? Tool: coref resolver. For pro-drop languages: zero-anaphora extension required.

4. Topic continuity: Does generation stay on topic across paragraphs? Tool: topic-segment detection (TextTiling, BERT-based); compute topic-coherence across segments.

Zero Anaphora in Pro-Drop Languages

~20–40% of pronoun chains in Mandarin, Japanese, Spanish, Italian are dropped (omitted in surface form). English-trained coref models miss these silently. Pro-drop requires zero-anaphora extension — not just a standard coref resolver.

Coreference is Genre-Specific

OntoNotes-trained coref models (news/Wikipedia) break on dialogue. Use ConvCoref or genre-matched data for conversational text.

RAG Citation Faithfulness

Citation faithfulness is a discourse-coherence problem — not just citation-overlap. A model citing "X from source Y" is only valid if (a) claim X actually appears in source Y AND (b) claim X coreferentially resolves to what the user asked. Naive citation-overlap metrics miss the coreference half.

Inputs & Outputs

InputDescription
Target language + task typeFor framework selection
Text samplesFor coreference/discourse analysis
OutputDescription
Framework recommendationRST / PDTB / GUM / SDRT + rationale
Coreference approachResolver + zero-anaphora flag for pro-drop
Discourse-marker lexiconPer-language classifier recommendation
Coherence eval metricsProbe types + per-phenomenon
workspace_state.md entryDiscourse plan

Example Usage

Language: Mandarin (cmn), task: long-context QA evaluation

Discourse Analysis: Mandarin (cmn)
- Framework: PDTB (connective-level) + RST (summarization coherence)
- Coreference: zero-anaphora extension MANDATORY
    (Mandarin pro-drop rate ~35% of pronoun chains)
- Coref model: genre-matched (not OntoNotes); use Mandarin-specific resolver
- Discourse-marker lexicon: construct from PDTB-aligned Chinese corpus
- Topic continuity: TextTiling adapted for Chinese (no spaces; character-level)
- RAG faithfulness: PDTB connective check + zero-anaphora coreference check
- Eval: coherence probe set (150 discourse-relation minimal pairs)
Was this page helpful?
Edit on GitHub

Last updated on

On this page