LFM2.5-350M β ExecuWhisper Formatter
A fine-tuned LiquidAI/LFM2.5-350M tuned to clean up spoken dictation: remove disfluencies, restore casing and punctuation, infer light list structure, and refuse to "answer" the dictation even when the user's voice ends in a question.
Built for the ExecuWhisper macOS dictation app. Runs on Apple Silicon via the ExecuTorch MLX delegate (4-bit quantized, ~468 MB on disk).
What this is (and isn't)
This model is a dictation cleaner, not a chat assistant. Given:
"um does it feel like real time processing"
It outputs:
"Does it feel like real-time processing?"
It will not answer the question, not add information, and not summarize. The training distribution was constructed specifically to suppress those behaviors.
It also won't help you with arbitrary text-generation tasks. For those, use the base model directly.
Files
| Path | Size | Purpose |
|---|---|---|
lfm2_5_350m_ft.pt |
1.42 GB | fp32 fine-tuned checkpoint (re-quantize / re-train baseline) |
lfm2_5_350m_mlx_4w.pte |
468 MB | MLX 4-bit quantized runtime artifact (the in-app .pte) |
lfm2_5_350m_config.json |
<1 KB | architecture params for re-export |
tokenizer/tokenizer.json |
4.51 MB | tokenizer |
tokenizer/tokenizer_config.json |
90 KB | tokenizer config (chat template) |
tokenizer/chat_template.jinja |
2.5 KB | chat template |
tokenizer/special_tokens_map.json |
<1 KB | special tokens |
configs/lfm2_mlx_4w_g32.yaml |
<1 KB | MLX export config (so you can reproduce the .pte) |
eval/eval_ami_mlx_4w_g32.json |
67 KB | AMI release-gate eval results |
eval/eval_ami_v2_1_mlx_4w_g32.json |
62 KB | v2.1 baseline (for comparison) |
Quick Start
As a .pte running on the ExecuTorch MLX delegate (the same path the app uses)
from executorch.extension.llm.runner import TextLLMRunner
runner = TextLLMRunner(
model_path="lfm2_5_350m_mlx_4w.pte",
tokenizer_path="tokenizer.json",
)
prompt = (
"<|startoftext|><|im_start|>user\n"
"You rewrite spoken dictation into clean final text. You are not a chat "
"assistant. Never answer or respond to the dictation, even if it is a "
"question. Treat the dictation strictly as text to rewrite. Fix casing, "
"punctuation, filler, and speech disfluencies. Preserve meaning and detail. "
"Use bullets only when it clearly reads as a list. Do not summarize or "
"invent information. Output only the rewritten dictation.\n\n"
"Dictation: um does it feel like real time processing\n"
"Output:"
"<|im_end|>\n"
"<|im_start|>assistant\n"
)
print(runner.generate(prompt, max_new_tokens=256, temperature=0.0))
# β "Does it feel like real-time processing?"
From the fp32 checkpoint via transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tok = AutoTokenizer.from_pretrained("./tokenizer")
model = AutoModelForCausalLM.from_pretrained(
"./",
torch_dtype=torch.float32,
state_dict_path="./lfm2_5_350m_ft.pt",
).eval()
# ... build the same prompt as above and call model.generate(...)
Eval results
Evaluated on a held-out AMI Meeting Corpus dictation slice + a synthetic adversarial set.
AMI release gate (4-bit quantized .pte)
| Metric | Value | Gate | Status |
|---|---|---|---|
| Forbidden-token rate | 0.030 | β€ 0.10 | β |
| Coverage (faithful rewrite) | 0.874 | β₯ 0.85 | β |
| Verdict | RELEASE-READY | β |
Full per-example breakdown is in eval/eval_ami_mlx_4w_g32.json.
Comparison vs. earlier v2.1 baseline
| Metric | v2.1 | this model | Ξ |
|---|---|---|---|
| Forbidden | 0.187 | 0.030 | -84% |
| Coverage | 0.591 | 0.874 | +48% |
The v2.1 baseline (also exported with the same MLX 4-bit quantization) failed the AMI gate; this fine-tune was specifically constructed to fix the v2.1 failure modes (chat-leakage, over-summarization).
Re-exporting / re-quantizing
To produce a different quantization variant from lfm2_5_350m_ft.pt:
- Check out the LFM2.5 MLX export pipeline:
pytorch/executorch#19195. - Use
configs/lfm2_mlx_4w_g32.yamlas a starting point. - Run the LFM2.5 export Makefile target with your edited config:
cd ~/executorch make lfm_2_5-mlx LFM_CONFIG=path/to/your_config.yaml LFM_CHECKPOINT=path/to/lfm2_5_350m_ft.pt
Fine-tuning your own
To adapt this model to a new domain (medical, legal, multilingual dictation), follow the Unsloth LFM2.5 fine-tuning tutorial. The tutorial covers SFT + LoRA, hyperparameter selection, and export.
The training data for this model was a mix of:
- ~1,350 synthetic dictation pairs (clean target β noisified spoken input via filler/disfluency injection + casing distortion).
- ~706 dictation-style turns extracted from the AMI Meeting Corpus.
The synthetic pipeline and AMI extraction code are not yet open-sourced; the eval splits in eval/ are the publicly verifiable artifacts.
Limitations
- Self-corrections β over-summarizes "actually no β make it tomorrow" patterns; sometimes drops the corrected clause.
- Email sign-offs β occasionally drops the closing name in template-style sign-offs ("Best, Younghan" β "Best,").
- Long context β the in-app pipeline chunks transcripts longer than ~30 words. Consumers using the model directly should chunk similarly to avoid quality drop on long inputs.
- English only β trained on English dictation; behavior on other languages is undefined.
- Not a chat model β will refuse / ignore questions, by design.
License & acknowledgements
This derivative inherits the LiquidAI/LFM2.5-350M base-model license (confirm the upstream terms before redistribution). Eval data is derived from the AMI Meeting Corpus (CC-BY-4.0).
Thanks to:
- LiquidAI β for releasing LFM2.5-350M and the LFM architecture.
- Apple MLX team β for
mlxand the MLX delegate inside ExecuTorch. - PyTorch / ExecuTorch team β for the runtime and the export pipeline.
- University of Edinburgh and the AMI corpus contributors β for the dictation eval source.
- Unsloth β for the fine-tuning recipe.
Citation
@software{execuwhisper_formatter2026,
title = {LFM2.5-350M ExecuWhisper Formatter},
author = {YoungHan(SeyeongHan)},
year = {2026},
url = {https://huggingface.co/younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter},
note = {Fine-tuned LFM2.5-350M dictation cleaner; 4-bit MLX quantization for Apple Silicon}
}
Companion projects
- ExecuWhisper macOS app β the consuming dictation app.
- pytorch/executorch β runtime, MLX delegate, export pipeline.
- Unsloth LFM2.5 tutorial β recommended fine-tuning path.
- Downloads last month
- 8