MAGIC Agent Skills is now open source! Star on GitHub
MAGIC Agent SkillsMAGIC Agent Skills
Getting Started

Supported Tools

The Linguistic Agent Skills suite integrates with a wide range of computational linguistics and ML tooling. Each skill references the tools relevant to its domain.

Language Identification

ToolCoverageUsed By
GlotLID (2024)1,600+ languages, paragraph-levellinguistic-corpus
FastText LID (176 languages)High-resource speedlinguistic-corpus
CLD3Broad coveragelinguistic-corpus (fallback)

Unicode & Script Tools

ToolPurposeUsed By
Python unicodedataNFC/NFKC normalizationlinguistic-scripts
Unicode TR39 confusable dataMixed-script deduplicationlinguistic-scripts
detect_confusables.pyFold/detect confusables + joinerslinguistic-scripts

Tokenization & Vocab Extension

ToolPurposeUsed By
SentencePiece 0.1.96+Unigram/BPE tokenizer traininglinguistic-tokenize
FOCUSVocab extension (close script pairs)linguistic-tokenize
OFAVocab extension (with parallel data)linguistic-tokenize
HyperOfaVocab extension (minimal data)linguistic-tokenize

Data Sources & Corpora

ResourceLanguagesUsed By
CulturaX167 languageslinguistic-corpus
MADLAD-400400+ languageslinguistic-corpus
Glot500500+ languageslinguistic-corpus
OLDI (Open Language Data Initiative)100+ languageslinguistic-corpus
Wikipedia dumps300+ languageslinguistic-corpus
OPUSParallel data, many languageslinguistic-bitext
FLORES-200200 languages, evallinguistic-eval
NTREX-128128 languages, evallinguistic-eval
Belebele122 languages, reading complinguistic-eval

Bitext Mining & Alignment

ToolPurposeUsed By
LASER3Sentence embeddings for alignmentlinguistic-bitext
SONAR (Meta 2024)Embeddings — stronger for Bantu/Indigenouslinguistic-bitext
VecalignSentence alignment (preferred)linguistic-bitext
hunalignSentence alignment (legacy)linguistic-bitext

Transfer Learning & Fine-Tuning

ToolStrengthsUsed By
Unsloth2× faster QLoRA, single-GPUlinguistic-transfer
LLaMA-FactoryMulti-GPU + complex samplinglinguistic-transfer
AxolotlYAML-config middle groundlinguistic-transfer
HuggingFace PEFTFlexible LoRA/QLoRA/DoRAlinguistic-transfer
MAD-X / BAD-X adaptersLanguage + task adapter stackslinguistic-transfer

Morphological Analysis

ToolPurposeUsed By
UniMorphGold paradigms, 100+ languageslinguistic-morph
SIGMORPHON 2022/2023Unsupervised segmenterslinguistic-morph
HFSTRule-based FST analyzerlinguistic-morph
fomaAlternative FST toolkitlinguistic-morph
MorfessorStatistical segmenterlinguistic-morph

Syntax & Parsing

ToolStrengthsUsed By
Trankit (2021)Best low-resource UD quality (XLM-R)linguistic-syntax
stanza (Stanford 2020)70+ languages, fastlinguistic-syntax
UDify (2019)Single multilingual modellinguistic-syntax
UD treebank corpus100+ languageslinguistic-syntax

Semantics & Lexical Resources

ToolPurposeUsed By
Open Multilingual WordNet (OMW)Synsets for 100+ languageslinguistic-semantics
PARSEMEMWE annotation datasetslinguistic-semantics
COMET-22Learned MT quality metriclinguistic-semantics, linguistic-eval
LaBSE / SONARCross-lingual embeddingslinguistic-semantics, linguistic-eval

Speech & Audio

ToolCoverageUsed By
MMS (Meta Massively Multilingual Speech)1,107 languageslinguistic-speech
Whisper (OpenAI)~99 languageslinguistic-speech
LhotseAudio pipeline (CutSet standard)linguistic-speech
ELANField annotation formatlinguistic-speech
Praat (TextGrid)Phonetic annotationlinguistic-speech
FLEx FieldWorksLexicographic field datalinguistic-speech
WikiPronG2P / IPA crowd-sourced datalinguistic-speech
VITS / Tacotron2Low-resource TTSlinguistic-speech

Annotation Tools

ToolPurposeUsed By
Label StudioGeneral annotation UIlinguistic-annotate
ProdigyActive-learning annotationlinguistic-annotate
INCEpTIONNLP annotation with IAAlinguistic-annotate
bratLightweight web annotationlinguistic-annotate

Evaluation & Metrics

ToolMetricUsed By
sacrebleuchrF++, spBLEU, BLEUlinguistic-eval
COMET / xCOMETLearned MT qualitylinguistic-eval
GEMBA-MQMLLM-judge MQM rubriclinguistic-eval
AfroBenchAfrican language benchmarkslinguistic-eval
IndicXTREMEIndic language benchmarkslinguistic-eval
SEACrowdSoutheast Asian benchmarkslinguistic-eval

Typological Databases

DatabaseCoverageUsed By
WALS (World Atlas of Language Structures)2,662 languageslinguistic-scope
Grambank2,467 languageslinguistic-scope
URIEL / lang2vecTypological distance vectorslinguistic-scope, linguistic-transfer
GlottologLanguage catalog + genealogylinguistic-scope
Was this page helpful?
Edit on GitHub

Last updated on

On this page