NPC Voice Model — soup-0.6 (Best)
Best performing model in the NPC voice series. A weight-averaged merge of v5-SFT (60%) and v5-DPO (40%), fine-tuned on Qwen3-0.6B.
The model takes a plain factual sentence and rewrites it in a character's voice, conditioned on 6 persona parameters.
Why soup-0.6?
SFT alone generates some verbatim copies and quote-wrapping failures. DPO alone over-corrects and forgets rare relation types. Weight averaging (model soup) gives the best of both: DPO's structural fixes + SFT's coverage.
| Model | Pass | Halluc fail | Fact pres | EN | ES |
|---|---|---|---|---|---|
| v4 | 59.0% | 32.5% | 1.31 | 66% | 50% |
| v5-SFT | 60.0% | 29.6% | 1.39 | 68% | 44% |
| v5-DPO | 57.0% | 36.0% | 1.26 | 69% | 40% |
| soup-0.6 | 61.5% | 28.5% | 1.42 | 75% | 44% |
Task
INPUT: TONE:grumpy STYLE:blunt HUMOR:none RELATION:stranger ROLE:blacksmith
FACT: Iron swords cost 15 gold.
OUTPUT: Fifteen gold. Don't haggle.
Parameters
| Param | Values |
|---|---|
| TONE | grumpy, cheerful, neutral, fearful, proud, bitter, nervous, wise, cunning, melancholic, playful |
| STYLE | short, verbose, blunt, rambling, poetic |
| HUMOR | none, dry, sarcastic, warm, dark |
| RELATION | stranger, friend, enemy, ally, rival, mentor, debtor, heretic, worshipper, + 20 more |
| ROLE | blacksmith, innkeeper, guard, merchant, peasant, scholar, noble, priest |
Only RELATION changes at runtime based on game state. The other 4 are fixed per NPC at config time.
Inference (Python)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"walter-bd/npc-voice-soup06",
max_seq_length=256,
)
FastLanguageModel.for_inference(model)
prompt = "TONE:grumpy STYLE:blunt HUMOR:none RELATION:stranger ROLE:blacksmith\nFACT: Iron swords cost 15 gold.\nOUT:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=80, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True).split("OUT:")[-1].strip())
GGUF / Ollama
Note:
ollama run hf.co/walter-bd/npc-voice-soup06:Q5_K_Mwill trigger Qwen3 thinking mode. Use the custom Modelfile below to suppress it and use the correct raw prompt format.
# Download GGUF and Modelfile
wget https://huggingface.co/walter-bd/npc-voice-soup06/resolve/main/gguf/npc-voice-soup06.Q5_K_M.gguf
wget https://huggingface.co/walter-bd/npc-voice-soup06/resolve/main/Modelfile
# Create local model with thinking suppressed
ollama create npc-voice-soup06 -f Modelfile
# Run
ollama run npc-voice-soup06 "TONE:grumpy STYLE:blunt HUMOR:none RELATION:stranger ROLE:blacksmith
FACT: Iron swords cost 15 gold.
OUT:"
How it was built
- v5-SFT — LoRA fine-tune on ~35k bilingual rows with targeted weak-slot coverage
- v5-DPO — Direct Preference Optimization using 1,195 real model failure pairs as rejected examples
- soup-0.6 —
0.6 x v5-SFT + 0.4 x v5-DPOweight average
Dataset: walter-bd/npc-voice-dataset Code: github.com/walter-bd/small-persona-llm
- Downloads last month
- 165
5-bit