MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Skills

Overview

linguistic-ethics enforces the social and legal obligations that surround language data — obligations that a technically valid license does not automatically satisfy. It applies the CARE principles (Collective benefit, Authority to control, Responsibility, Ethics) alongside FAIR, and manages Free Prior Informed Consent (FPIC) for Indigenous and endangered-language data.

A good engineer can build a tokenizer, mine bitext, and fine-tune a model. None of that protects against training on a dataset whose community didn't consent to model use, releasing a model that generates sacred Indigenous content without permission, or stripping attribution lineage during a dataset merge. These are the high-cost mistakes — they damage communities, harm professional reputation, and carry increasing regulatory consequences (EU AI Act, Indigenous data sovereignty laws).

This skill is routed by the orchestrator twice: early in Scope as an awareness seed, and again at Release as the final gate. It is also invoked per-dataset during Acquire — every dataset that enters the mix crosses an ethics boundary, even open-licensed ones.

Pipeline Position

This skill operates in Phase 1 — Acquire (early seed at Scope, and final gate at Release).

Preceding skills: linguistic-scope (provides vitality/EGIDS status that sets ethics depth required) Following skills: linguistic-corpus, linguistic-bitext (only after per-dataset clearance); release decision after final gate

When It Activates

  • Any new dataset being considered for training or eval — before download
  • Endangered or Indigenous language data of any kind
  • Religious or sacred text use (Bible-NLP, Quranic, Vedic, Indigenous oral histories)
  • License audit before release (open / community-gated / restricted decision)
  • Attribution and provenance tracking design
  • Drafting a model card's Ethics, Limitations, and Intended Use sections
  • Routing decisions involving community-controlled archives (DELAMAN, ELAR, AILLA, PARADISEC)

When NOT to use: the dataset is your own English-only synthetic data with no community attribution issues, and the operation is a pure technical refactor with no data implications. Even then, ask once — under-using ethics is the modal failure mode.

What It Does

CARE vs FAIR — a dataset can be fully FAIR (standardized, downloadable, openly licensed) and still violate CARE:

CARE PrincipleMeaning
Collective benefitDoes this serve the source community?
Authority to controlCommunity decides terms
ResponsibilityFor harms downstream
EthicsThrough engagement, not just consent

FPIC requires all four components: Free (without coercion), Prior (before data is used), Informed (community understood what models trained on this data could do), Consent (affirmative; can be withdrawn). FPIC is process, not document.

License compatibility for dataset mixes:

  • Any CC-BY-NC in the mix → entire model is non-commercial-use only
  • Any CC-BY-SA in the mix → model output must propagate ShareAlike
  • ND blocks derivative use; if mixed, terms already violated

Sacred-text decision framework (not a hardcoded blocklist):

ExampleWhat's restrictedWhy
Quranic textGeneration/transformationReligious community standards
Indigenous oral historiesPublic release; transformationCustodian permission required
Sami yoik recordingsNon-Sami contexts; commercialCultural ownership; Sami Council
Aboriginal Australian songlinesRecording, distribution, model useICIP protocols
Bible-NLP / liturgical textCommercial training; canonical distortionCommunity use norms

Release modes:

ModeRequirements
OpenAll-open licenses + attribution complete + no community restrictions + standard model card
Community-gatedCommunity sign-off; access criteria + revocation path; model card cites partner
RestrictedUse-policy + access controls; legal review

Example Usage

Dataset: Bible-NLP Yoruba (CC-BY 4.0)

## Ethics Assessment: Bible-NLP Yoruba Corpus

**Source(s):** Bible-NLP project
**License(s):** CC-BY 4.0
**License compatibility:** OK for open; flag in commercial mix
**CARE check:** NEEDS-WORK — liturgical register >60%;
    community norms prefer non-commercial generative use
**FPIC required?** NO (CC-BY + EGIDS 2 Provincial)
**Sacred-text concerns:** Bible-NLP — flag in model card; limits commercial generation
**Attribution registry status:** COMPLETE
**Recommended release mode:** OPEN (with model card noting register + use norms)
**Outstanding actions:** Limit Bible % in mix to ≤30%; add register-drift warning to model card
Was this page helpful?
Edit on GitHub

Last updated on

On this page