naxi-qwen3-14b-v5

A fine-tuned large language model for Naxi ↔ Chinese ↔ English translation

Naxi (纳西语) is a Sino-Tibetan language spoken by approximately 300,000 people in Lijiang, Yunnan Province, China. This model is the first publicly available large language model fine-tuned specifically on the Naxi language.

Quick Facts

Property	Value
Base model	Qwen/Qwen3-14B
Fine-tuning method	LoRA (r=256, α=256, 7 modules)
Training precision	bf16 throughout
Training corpus	~62,000 weighted examples
NTQS score	82.3 / 100
Best task	Phonological recognition (char_f1 = 0.962)
Languages	Naxi, Mandarin Chinese, English
License	CC BY-NC 4.0
HuggingFace	`Apeters247/naxi-qwen3-14b-v5`

1. Background: The Naxi Language

Naxi (also written Nakhi, Nahi, Na, 纳西) is a Sino-Tibetan language of the Naxi–Bai branch, spoken primarily in Lijiang Prefecture, Yunnan Province, People's Republic of China, with smaller communities in Sichuan and Tibetan borderlands. The prestige variety — Lijiang Naxi or the Common dialect (COM in ISO 639-3: nxq) — is the focus of this model.

Typological profile

Naxi is a tonal, verb-final, agglutinative language with a six-tone system encoded in its Latin romanization through tone-final consonants (see §5). Unlike neighboring Tibetan or Chinese, Naxi encodes evidentiality and aspect morphologically, making it typologically unusual among languages of the Tibetan plateau and Yunnan corridor. It coexists with Mandarin, creating extensive contact-induced change, particularly in loanword phonology.

The Naxi people are also known for the Dongba script (东巴文), one of the last pictographic writing systems still in active religious use, employed by Dongba priests (tomba) for ritual texts. However, this model is trained exclusively on Latin romanization (Naxi Pinyin), the modern standardized orthography used in contemporary education and linguistics.

Endangerment status

UNESCO classifies Naxi as "vulnerable" (2010 assessment), with intergenerational transmission weakening in urban Lijiang due to Mandarin dominance in education and commerce. Younger speakers often code-switch extensively or acquire only passive competence. Digital infrastructure for Naxi is near-zero: no keyboard input method exists in mainstream operating systems, no online machine translation service supports it, and pre-existing NLP tooling is absent. This model represents an effort to begin closing that gap.

2. Academic Literature Review

The academic study of Naxi phonology, grammar, and lexicography spans roughly a century, with the most rigorous computational work emerging only in the past two decades.

2.1 Phonology and Romanization Standards

Michaud, Alexis (2008). Tones. Cambridge: Cambridge University Press. Chapter on Naxi tone system. The definitive modern reference for Naxi phonology. Michaud establishes the six-tone inventory (High ˥, Low ˩, Low-Rising ˩˥, Mid ˧, Mid-Rising ˧˥, Checked ˥ʔ) and their Chao numeral representations. His IPA transcriptions differ from naive Pinyin-based assumptions in important ways: the initial j- represents [c] (not [tɕ] as in Mandarin), and e represents [ɤ] (not [ə]). This model's training pipeline uses Michaud's standard throughout.

Michaud, Alexis et al. (2020+). Pangloss Collection. CNRS/LACITO Endangered Languages Archive. The Pangloss Collection contains Michaud's fieldwork recordings with native Naxi speakers — transcribed utterances with IPA annotations and interlinear glosses — constituting the largest open-access scholarly corpus of spoken Naxi. A subset of 1,952 items (converted from IPA to Naxi Pinyin) forms part of this model's training data. Available at: https://pangloss.cnrs.fr/

He Jiren 和即仁 & Jiang Zhuyi 姜竹仪 (1985). 《纳西语简志》 [Brief Description of the Naxi Language]. Beijing: Nationalities Press. The canonical Chinese-language grammatical description of Naxi, documenting morphosyntax, tone sandhi, and dialectal variation between the Lijiang (COM) and Yongning-adjacent varieties. Essential reference for training prompt templates.

He Jiren 和即仁 (2012). Naxi Language Reference Grammar. The most comprehensive modern treatment of Naxi morphosyntax, covering aspect marking, evidentiality, and nominal case. Informed the design of the task prompts.

Pinson, Thomas (2012). Naxi-English Dictionary. Summer Institute of Linguistics. Approximately 10,000 headwords with etymological notes on Chinese loanwords, IPA transcriptions, and example sentences. A core lexicographic reference; expansion of coverage into this model is ongoing.

2.2 Ethnographic and Cultural Sources

Rock, Joseph F. (1937). "The Origin of the Tso-la Books, or Books of Divination of the Na-khi or Chiang." Bulletin de l'École française d'Extrême-Orient (BEFEO), Vol. 37. Rock's BEFEO writings document Naxi ceremonial culture, place names, and ethnobotany in the Lijiang-Tibet corridor. Though written before modern Naxi romanization was standardized, his phonemic transcriptions provide early attestation of lexical items.

Rock, Joseph F. (1947). The Ancient Na-khi Kingdom of Southwest China. 2 vols. Cambridge: Harvard University Press. Rock's magnum opus, containing extensive Naxi vocabulary and cultural description.

Roosevelt, Theodore III (1941). A Preliminary Study of the Nashi People. Senior thesis, Harvard University. An early English-language ethnographic overview of the Naxi, notable for systematic observations on language use and social organization. Contributed 150 parallel cultural text segments to the training corpus.

Yang Fuquan 杨福泉 (2006+). Multiple monographs on Naxi culture, religion, and social history, including studies on Naxi-Tibetan exchange and the role of the Dongba tradition in contemporary identity. Contributed 177 parallel text segments to the training corpus.

2.3 Low-Resource NLP and Fine-Tuning

Hu, Edward J. et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685. The foundational method employed for fine-tuning: decomposing weight updates into low-rank matrices to dramatically reduce trainable parameters while preserving base model knowledge. This model uses rank r=256 — unusually high — motivated by the need to inject a complete new vocabulary (Naxi Pinyin) that has zero overlap with the base model's training distribution.

Dettmers, Tim et al. (2023). "QLoRA: Efficient Finetuning of Quantized LLMs." arXiv:2305.14314. Evaluated in v1 of this series (see §4). QLoRA's 4-bit quantization proved incompatible with bf16 serving, causing a 30-point NTQS regression. Not used in v5.

Zhao, Wayne Xin et al. (2023). "A Survey of Large Language Models." arXiv:2303.18223. Provides context for the trade-offs between model scale and fine-tuning depth for low-resource languages.

Adelani, David Ifeoluwa et al. (2022). "A Few Thousand Translations Go a Long Way! Leveraging Pre-Trained Models for African Language Machine Translation." Proceedings of NAACL 2022. Demonstrates that LoRA fine-tuning on 10,000–60,000 parallel sentences can yield substantial translation quality improvements for extremely low-resource languages, validating the overall approach taken here.

Costa-jussà, Marta R. et al. (2022). "No Language Left Behind: Scaling Human-Centered Machine Translation." arXiv:2207.04672 (Meta AI). NLLB demonstrates that multilingual pre-training provides useful cross-lingual transfer even for languages with minimal training data. The Qwen3 base model, pre-trained on hundreds of languages, provides analogous transfer — particularly from the Sinitic branch — to Naxi.

Han, Xu et al. (2021). "Pre-Trained Models: Past, Present and Future." AI Open 2:225–250. Motivates using a large multilingual pre-trained model as base rather than training from scratch, which would require orders of magnitude more data.

2.4 Chinese Linguistic Context and Loanword Phenomena

Naxi has absorbed extensive Mandarin loanwords across centuries of contact (Han immigration, Mu dynasty administration, 20th-century nationalization). The model includes a loanword detector trained to flag Chinese borrowings (借词 / 外来词) and score them appropriately in evaluation. Key patterns documented in the literature include:

Phonological integration of Mandarin syllable-final consonants into Naxi's tone system
Calques (structural borrowings) for abstract and administrative vocabulary
Code-switching at the sentence level in contemporary urban speech

3. Training Corpus

The training data comprises 62,021 weighted examples (37,418 unique) assembled from multiple scholarly and community sources, spanning dictionary entries, parallel text, folk literature, and academic ethnographic writing. An 80/20 source-grouped train/eval split ensures zero leakage between sets.

3.1 Sources

Category	Description	Items
Lexicographic database	Community Naxi–Chinese–English dictionary; 3,866 headwords with definitions, IPA transcriptions, POS tags, and example sentences	3,866 entries + 1,307 examples
LACITO/Pangloss fieldwork	Michaud et al. scholarly recordings; IPA converted to Naxi Pinyin via custom greedy-matching algorithm (98.9% conversion rate)	1,952 items
Parallel literary corpus	Digitized 20th-century trilingual parallel texts (Naxi / Chinese / English) covering narrative, poetry, and didactic prose	~7,758 segments
Folk wisdom & humor	Contemporary Naxi proverbs, witticisms, and folk sayings collected 2024–2025; bilingual Naxi–Chinese	186 items
Cultural monographs	Yang Fuquan, Rock 1937 BEFEO, Ayidan folk tales, Roosevelt 1941 thesis	~463 items
Place names	Naxi toponyms from the Yamada collection, 445 entries (273 with IPA), covering Lijiang prefecture and surrounding areas	445 items

3.2 Corpus Construction

Training examples are formatted using the ChatML template (<|im_start|>system / user / assistant<|im_end|>) with task-specific system prompts. Five task types were defined:

sentence: Naxi → Chinese or English sentence translation
reverse_zh: Chinese → Naxi (generation task; 3× upsampled)
reverse_en: English → Naxi (generation task; 3× upsampled)
dictionary: Chinese concept → Naxi headword + gloss
phonological: IPA/phoneme identification and Naxi Pinyin rendering

Generation tasks (reverse_zh + reverse_en) were upsampled 3× based on the finding in v2 that generation capacity is the most commercially useful and hardest skill to acquire — a "generation-heavy" (78% of weighted corpus) approach that proved decisive in achieving 71.7+ NTQS.

3.3 Data Quality Measures

IPA leakage: Zero IPA symbols in Naxi-language outputs (converted via ipa_to_pinyin() with greedy initial matching and tone-boundary splitting)
Tone alignment: Chao numerals (⁵⁵, ²¹, etc.) converted to Naxi Pinyin tone letters
Polysemy disambiguation: 411 polysemous dictionary entries given part-of-speech tags to reduce hallucination
Invalid rate: 0.0% invalid examples at corpus build time
Leakage: ZERO train/eval contamination via source-grouped 80/20 split (previously 309 leaked items, now 0)

4. Model Training

4.1 Architecture

Base:      Qwen/Qwen3-14B (14.8B parameters, 64-layer transformer, RoPE, GQA)
Method:    LoRA (Low-Rank Adaptation)
Rank:      r=256, α=256
Modules:   q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (all 7)
Trainable: ~2.1% of total parameters
Precision: bf16 throughout (training + serving — never mixed)

4.2 Training Configuration

Optimizer:     adamw_torch_fused  (paged_adamw_32bit unavailable — requires cuMemAllocManaged)
Learning rate: 3e-4 (cosine schedule with warmup)
Epochs:        3
Batch size:    2 (per GPU)
Grad accum:    16 steps (effective batch = 32)
Max seq len:   2048 tokens (packing enabled)
Packed seqs:   15,180 packed from 61,652 examples at seq_len=2048
Hardware:      Thunder Compute H100 (80GB VRAM)
VRAM usage:    ~55–59% (46–50GB / 85GB) — stable throughout
Steps:         1,425
Total time:    ~22 hours

4.3 Sequence Packing

Sequence packing (concatenating short examples end-to-end up to the context window) increased effective token throughput by approximately 4× compared to padded batching. This is critical for a corpus where most examples are 50–300 tokens — without packing, >80% of compute would be wasted on padding.

5. The Naxi Tone System and Pinyin Orthography

Understanding Naxi orthography is essential for interpreting model outputs. The romanization system uses a tone-final consonant convention:

Final letter	Tone	Chao	IPA	Example
`l`	High	55	˥	`jel` [cjɤ˥] "to boil/cook"
`q`	Low	21	˩	`laq` [lɑ˩] "hand"
`f`	Low-rising	13	˩˥	`baf` [pɑ˩˥] "eight" (loanword)
`g`	Mid-rising	35	˧˥	`juqg` [tɕy˧˥] (rare)
`b`	Checked	55ʔ	˥ʔ	`dab` [tɑ˥ʔ] (rare)
(none)	Mid	33	˧	`ha` [hɑ˧] "food"

Key phonological notes (Michaud 2008):

j → [c], not [tɕ] (a palatal stop, not affricate)
e → [ɤ], not [ə] (back unrounded mid vowel)
z → voiceless [ts]; zz → voiced [dz]
ng → [ŋ] (velar nasal, can be syllable-initial)

The model is trained to produce valid Naxi Pinyin with correct tone markers. Tone completeness — the proportion of syllables bearing an explicit tone marker — is a core evaluation metric, rising from 42.3% in v2 to 69.2% in v5.

6. Evaluation: NTQS (Naxi Translation Quality Score)

Standard MT metrics (BLEU, chrF) are inadequate for Naxi because:

No off-the-shelf tokenizer handles Naxi Pinyin correctly
Tone markers (l/q/f/g/b) appear as trailing consonants and require phonology-aware parsing
No reference translations exist in standard evaluation datasets

We developed NTQS (Naxi Translation Quality Score, 0–100) with five components:

Component	Max	Measures
Tone Accuracy	25	% syllables with valid tone markers; marker correctness vs. reference
Semantic Fidelity	30	Token/character Jaccard overlap with reference (CJK bigrams or word tokens)
Loanword Handling	15	Correct flagging of Chinese borrowings when present
Dialect Appropriateness	15	Grammar score (0.6 × syllable_validity + 0.4 × tone_completeness) × 100
Completeness	15	Output length ratio relative to source (neither truncated nor hallucinated)

All evaluation is fully local — no LLM-as-judge, no external API calls. The phonology engine (naxi_phonology.py) validates syllable structure against the complete Naxi phoneme inventory documented in config/naxi_pinyin.yaml.

7. Results

7.1 Overall Performance

Model	NTQS	Tone/25	Semantic/30	Dialect/15	Complete/15	char_f1
naxi-qwen3-14b-v5	82.3	19.2	23.2	12.0	14.6	0.775
naxi-qwen2.5-3b-v2	71.7	13.9	23.9	9.0	9.9	0.807
naxi-qwen35-9b-v4 (Q4_K_M)	37.98	—	—	—	—	0.342
Zero-shot frontier LLMs (avg)	~41.3	~20.0	~0.4	~8.0	~7.5	~0.1

Zero-shot frontier models (Qwen-2.5-72B, Claude-3.5-Sonnet, Gemini-2.0-Flash, DeepSeek-v3) achieve ~41–52 NTQS on the Naxi→zh direction only (they can comprehend but cannot generate Naxi). On generation tasks (→Naxi), zero-shot performance collapses to near-zero.

7.2 Per-Task Breakdown (v5)

Task	char_f1	exact_match	tone_completeness	Notes
phonological	0.962	0.640	—	Strongest task; IPA↔Pinyin conversion
reverse_en (EN→Naxi)	0.811	—	0.687	Strong generation from English
reverse_zh (ZH→Naxi)	0.811	—	0.696	Strong generation from Chinese
sentence (Naxi→ZH/EN)	0.691	—	—	Understanding task
dictionary	0.609	0.000	—	Weakest; no exact dictionary matches

7.3 Key Improvements over v2

Metric	v2 (3B)	v5 (14B)	Delta
NTQS total	71.74	82.3	+10.6
tone_completeness	0.423	0.692	+63.6%
grammar_score	69.09	80.21	+16.1%
syllable_validity	0.869	0.876	+0.8%

Tone completeness — the most critical weakness in v2 — improved dramatically. This is attributed to: (1) the 14B model's larger representational capacity, (2) sequence packing exposing more tone-marked syllables per training step, and (3) all 7 LoRA modules being adapted (vs. 4 in v2).

8. Learnings from Previous Fine-Tuning Iterations

v1 — Qwen2.5-14B QLoRA 4-bit (FAILED, NTQS 42.1)

Configuration: LoRA r=48, α=96, 4-bit QLoRA quantization, two-stage LAFT curriculum.

Failure mode: Precision mismatch. The model was trained in 4-bit quantized format but served in bf16. LoRA adapters trained under quantization learn compensation terms specific to the quantization error. When served in full precision, those compensation terms become noise. The result was NTQS 42.1 — statistically indistinguishable from zero-shot frontier performance.

Lesson: Never mix precision between training and serving. If training in 4-bit QLoRA, all inference must also use 4-bit quantization. If serving in bf16, train in bf16.

v2 — Qwen2.5-3B LoRA bf16 (SUCCESS, NTQS 71.7) ✓

Configuration: LoRA r=256, α=512, bf16 throughout, 4 attention modules only (q/k/v/o), single-stage training, generation-heavy corpus (78%).

Outcome: +30.4 NTQS over baseline. First successful model. Key validated findings:

r=256 is necessary: High LoRA rank provides the capacity to inject a completely foreign orthographic system (Naxi Pinyin) with zero overlap to the base model's training vocabulary.
Generation-heavy data (reverse_zh + reverse_en upsampled 3×) is the decisive factor for useful performance.
bf16 precision match (train = serve) is non-negotiable.
Single-stage training outperforms two-stage LAFT curriculum at this data scale.

Remaining weaknesses: tone_completeness only 42% (syllables frequently generated without tone markers); dictionary task weak (char_f1=0.56); 13% invalid syllables (hallucinations).

v3 — Qwen3.5-9B LoRA bf16 (ABANDONED)

Superseded before evaluation. Architecture investigation revealed Qwen3.5-9B is a vision-language model (VLM) with Mamba-Attention hybrid architecture (Qwen3_5ForConditionalGeneration), not a text-only transformer. Text-only LoRA fine-tuning on a VLM backbone is suboptimal; the model expects multimodal inputs that the Naxi corpus cannot provide.

Lesson: Verify model architecture before committing compute. AutoModelForCausalLM will load VLMs without error but with degraded text performance.

v4 — Qwen3.5-9B LoRA bf16 (REGRESSED, NTQS 37.98)

Configuration: LoRA r=64, α=128, RSLora, 7 modules. Evaluated via Q4_K_M GGUF (Ollama) and bf16 direct inference.

Failure modes (multiple compounding):

LoRA rank r=64 is insufficient. Cutting rank from 256 to 64 reduces adapter capacity by 4×. The model never learned Naxi Pinyin orthography adequately — generating IPA notation instead (leaking from the base model's exposure to phonetics literature).
VLM architecture issue (same as v3): Qwen3.5-9B's hybrid attention/Mamba structure is not optimized for pure text generation.
Q4_K_M quantization further degrades tone marker accuracy — single-character tone markers (l/q/f/g/b) are semantically loaded but look like common English/Chinese characters; quantization conflates them.
Unsloth tokenizer patches: Unsloth modifies the tokenizer class at load time. Direct Python inference (without Unsloth) produces different tokenization, causing format violations.

Lesson: Never drop LoRA rank below r=128 for Naxi. The orthographic injection problem is severe enough that high rank is load-bearing, not optional.

v5 — Qwen3-14B LoRA bf16 (CURRENT CHAMPION, NTQS 82.3) ✓

Addressed all prior failure modes:

Correct architecture: Qwen3-14B is a text-only transformer (not VLM)
High rank maintained: r=256 (same as v2 success)
All 7 modules: Adds gate/up/down MLP projections not present in v2 — captures syntactic patterns more deeply
bf16 throughout: Training and serving in identical precision
Sequence packing: Dramatically more token-efficient training
Correct optimizer: adamw_torch_fused instead of paged_adamw_32bit (which requires CUDA managed memory unavailable on Thunder Compute prototyping instances)

9. Limitations and Known Issues

Dictionary task: char_f1=0.609, exact_match=0.000. The model paraphrases rather than recalling specific lexical items. A retrieval-augmented generation (RAG) grounding layer using the Pinson dictionary is planned to address this.
Tone completeness at 69%: Approximately 31% of generated syllables lack explicit tone markers. For mid-tone (unmarked) syllables this is often correct, but the model sometimes omits markers on toned syllables. Tone-focused data augmentation (synthetic examples emphasizing tone marking) is planned.
Syllable validity at 87.6%: About 12.4% of generated syllables are not valid Naxi Pinyin (hallucinated combinations). Post-processing filters using naxi_phonology.py can catch these at inference time.
Dialect coverage: Training data is predominantly Common (COM) dialect (Lijiang variety). Lashi (LQ) and Yongning (YON) dialects are minimally represented. The Yongning dialect (spoken by ~5,000 people) has additional vowel contrasts and is treated as a separate language by some classifiers.
Dongba script: Not supported. The model generates exclusively in Latin romanization (Naxi Pinyin). Dongba characters require a separate rendering pipeline.
Context length: Maximum 2,048 tokens. Long documents require chunking.
Contemporary colloquial Naxi: Training data skews toward formal literary register. Informal spoken Naxi (with high code-switching density) may be handled less well.

10. Usage

With Transformers (bf16, recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE = "Qwen/Qwen3-14B"
ADAPTER = "Apeters247/naxi-qwen3-14b-v5"  # or local path to adapter/

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, ADAPTER + "/adapter")
model.eval()

def translate(text: str, direction: str = "zh_to_naxi") -> str:
    directions = {
        "zh_to_naxi": ("Chinese", "Naxi"),
        "en_to_naxi": ("English", "Naxi"),
        "naxi_to_zh": ("Naxi", "Chinese"),
        "naxi_to_en": ("Naxi", "English"),
    }
    src, tgt = directions[direction]
    system = f"You are a Naxi language expert. Translate the following {src} text into {tgt}."
    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": text},
    ]
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to(model.device)
    with torch.no_grad():
        out = model.generate(input_ids, max_new_tokens=512, temperature=0.1, do_sample=True)
    return tokenizer.decode(out[0][input_ids.shape[-1]:], skip_special_tokens=True)

# Examples
print(translate("我爱丽江", "zh_to_naxi"))
# → "Nge Liqjiangq gol pvl seiq."  (approximate)

print(translate("ngeq pvl seiq", "naxi_to_en"))
# → "I like it." / "I love it."

With GGUF / llama.cpp (Q4_K_M)

# Download GGUF from HuggingFace
huggingface-cli download Apeters247/naxi-qwen3-14b-v5 \
  naxi-qwen3-14b-v5-Q4_K_M.gguf --local-dir ./

# Run with llama.cpp
./llama-cli -m naxi-qwen3-14b-v5-Q4_K_M.gguf \
  --chat-template chatml \
  -sys "You are a Naxi language expert. Translate the following Chinese text into Naxi." \
  -p "我爱丽江" \
  -n 200

Warning: Q4_K_M quantization reduces tone accuracy. For production use, prefer bf16 inference if VRAM permits (requires ~30GB).

11. Project Infrastructure

This model is part of the Naxi Language Explorer project, an open-source effort to build translation and language documentation infrastructure for Naxi.

API: FastAPI endpoint at naxiai.com (rate-limited public access)
Leaderboard: Live NTQS benchmark at naxiai.com (compares fine-tuned models vs. zero-shot frontier LLMs)
Corpus: Maintained in PostgreSQL with SQLAlchemy ORM; 15 tables including lexical database, parallel corpus, and evaluation records
Evaluation: Fully local phonology-aware scoring via ntqs_scorer.py and naxi_phonology.py
GPU training: Thunder Compute H100 instances via SSH (tnr-0)

12. Planned Improvements

RAG dictionary grounding: At inference time, retrieve relevant entries from the Pinson dictionary and inject into the prompt context. Expected: +5–15 NTQS on dictionary task.
Tone-focused augmentation: Generate 5–10K synthetic examples emphasizing tone marking in varied phonological environments. Target: tone_completeness > 85%.
Pinson 2012 OCR ingestion: ~10,000 additional dictionary entries pending OCR processing. This is the highest-value remaining linguistic resource.
v6 planning: After RAG and tone augmentation, retrain on expanded corpus. Candidate base models: Qwen3-32B or a specialist Sino-Tibetan LM if available.
Syllable validity filter: Post-processing layer using is_valid_naxi_syllable() to catch and replace hallucinated phoneme combinations.

13. Citation

If you use this model in research, please cite:

@misc{naxi-qwen3-14b-v5-2026,
  title        = {naxi-qwen3-14b-v5: A Fine-Tuned LLM for Naxi--Chinese--English Translation},
  author       = {Peters, Andrew},
  year         = {2026},
  month        = {March},
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/Apeters247/naxi-qwen3-14b-v5},
  note         = {NTQS 82.3/100; Qwen3-14B + LoRA r=256, bf16; 62,021 training examples}
}

Please also cite the key scholarly sources that made this possible:

@book{michaud2008tones,
  author    = {Michaud, Alexis},
  title     = {Tones},
  publisher = {Cambridge University Press},
  year      = {2008}
}

@misc{pangloss2020,
  author       = {Michaud, Alexis and others},
  title        = {Pangloss Collection},
  howpublished = {CNRS/LACITO Endangered Languages Archive},
  url          = {https://pangloss.cnrs.fr/},
  year         = {2020}
}

@inproceedings{hu2021lora,
  title     = {LoRA: Low-Rank Adaptation of Large Language Models},
  author    = {Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Chen, Weizhu},
  booktitle = {International Conference on Learning Representations},
  year      = {2022}
}

14. Acknowledgments

This work builds on the scholarly foundations laid by Alexis Michaud (CNRS/LACITO) and Thomas Pinson (SIL), whose fieldwork recordings and phonological analysis make computational work on Naxi possible. The Naxi community of Lijiang, Yunnan — particularly speakers who participated in the Pangloss fieldwork sessions — are the ultimate source of this data, and its preservation and accessibility is the motivation for this project.

Critical config facts for any future training run:

Always use torch_dtype=torch.bfloat16 — never load_in_4bit=True
LoRA rank must be r≥256 for Naxi (r=64 failed catastrophically in v4)
Use adamw_torch_fused on Thunder Compute (not paged_adamw_32bit)
Max seq_len=2048 for 14B model with packing + batch=2 on H100 80GB
DB cannot be reached from host OS — run DB scripts inside naxi-api Docker container or via docker exec
API leaderboard prefix is /api/evaluate/scores (not /evaluate/scores)

HuggingFace repo: Apeters247/naxi-qwen3-14b-v5 (private)

adapter/ — LoRA weights (PeftModel format)
naxi-qwen3-14b-v5-Q4_K_M.gguf — 8.4GB quantized
naxi-qwen3-14b-v5-Q8_0.gguf — 15GB quantized

Last updated: 2026-03-08 Model status: Production (current champion, NTQS 82.3) Previous champion: naxi-qwen2.5-3b-v2 (NTQS 71.74)

Downloads last month: 14

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

4-bit

8-bit

Model tree for Apeters247/naxi-qwen3-14b-v5

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(268)

this model

Space using Apeters247/naxi-qwen3-14b-v5 1

Papers for Apeters247/naxi-qwen3-14b-v5

Evaluation results

NTQS (Naxi Translation Quality Score)
self-reported

82.300
chrF
self-reported

56.080
Character F1
self-reported

0.775