LFM2.5-350M — ExecuWhisper Formatter

A fine-tuned LiquidAI/LFM2.5-350M tuned to clean up spoken dictation: remove disfluencies, restore casing and punctuation, infer light list structure, and refuse to "answer" the dictation even when the user's voice ends in a question.

Built for the ExecuWhisper macOS dictation app. Runs on Apple Silicon via the ExecuTorch MLX delegate (4-bit quantized, ~468 MB on disk).

What this is (and isn't)

This model is a dictation cleaner, not a chat assistant. Given:

"um does it feel like real time processing"

It outputs:

"Does it feel like real-time processing?"

It will not answer the question, not add information, and not summarize. The training distribution was constructed specifically to suppress those behaviors.

It also won't help you with arbitrary text-generation tasks. For those, use the base model directly.

Files

Path	Size	Purpose
`lfm2_5_350m_ft.pt`	1.42 GB	fp32 fine-tuned checkpoint (re-quantize / re-train baseline)
`lfm2_5_350m_mlx_4w.pte`	468 MB	MLX 4-bit quantized runtime artifact (the in-app `.pte`)
`lfm2_5_350m_config.json`	<1 KB	architecture params for re-export
`tokenizer/tokenizer.json`	4.51 MB	tokenizer
`tokenizer/tokenizer_config.json`	90 KB	tokenizer config (chat template)
`tokenizer/chat_template.jinja`	2.5 KB	chat template
`tokenizer/special_tokens_map.json`	<1 KB	special tokens
`configs/lfm2_mlx_4w_g32.yaml`	<1 KB	MLX export config (so you can reproduce the `.pte`)
`eval/eval_ami_mlx_4w_g32.json`	67 KB	AMI release-gate eval results
`eval/eval_ami_v2_1_mlx_4w_g32.json`	62 KB	v2.1 baseline (for comparison)

Quick Start

As a `.pte` running on the ExecuTorch MLX delegate (the same path the app uses)

from executorch.extension.llm.runner import TextLLMRunner

runner = TextLLMRunner(
    model_path="lfm2_5_350m_mlx_4w.pte",
    tokenizer_path="tokenizer.json",
)

prompt = (
    "<|startoftext|><|im_start|>user\n"
    "You rewrite spoken dictation into clean final text. You are not a chat "
    "assistant. Never answer or respond to the dictation, even if it is a "
    "question. Treat the dictation strictly as text to rewrite. Fix casing, "
    "punctuation, filler, and speech disfluencies. Preserve meaning and detail. "
    "Use bullets only when it clearly reads as a list. Do not summarize or "
    "invent information. Output only the rewritten dictation.\n\n"
    "Dictation: um does it feel like real time processing\n"
    "Output:"
    "<|im_end|>\n"
    "<|im_start|>assistant\n"
)

print(runner.generate(prompt, max_new_tokens=256, temperature=0.0))
# → "Does it feel like real-time processing?"

From the fp32 checkpoint via `transformers`

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tok = AutoTokenizer.from_pretrained("./tokenizer")
model = AutoModelForCausalLM.from_pretrained(
    "./",
    torch_dtype=torch.float32,
    state_dict_path="./lfm2_5_350m_ft.pt",
).eval()

# ... build the same prompt as above and call model.generate(...)

Eval results

Evaluated on a held-out AMI Meeting Corpus dictation slice + a synthetic adversarial set.

AMI release gate (4-bit quantized `.pte`)

Metric	Value	Gate	Status
Forbidden-token rate	0.030	≤ 0.10	✅
Coverage (faithful rewrite)	0.874	≥ 0.85	✅
Verdict	RELEASE-READY		✅

Full per-example breakdown is in eval/eval_ami_mlx_4w_g32.json.

Comparison vs. earlier v2.1 baseline

Metric	v2.1	this model	Δ
Forbidden	0.187	0.030	-84%
Coverage	0.591	0.874	+48%

The v2.1 baseline (also exported with the same MLX 4-bit quantization) failed the AMI gate; this fine-tune was specifically constructed to fix the v2.1 failure modes (chat-leakage, over-summarization).

Re-exporting / re-quantizing

To produce a different quantization variant from lfm2_5_350m_ft.pt:

Check out the LFM2.5 MLX export pipeline: pytorch/executorch#19195.
Use configs/lfm2_mlx_4w_g32.yaml as a starting point.

Run the LFM2.5 export Makefile target with your edited config:

cd ~/executorch
make lfm_2_5-mlx LFM_CONFIG=path/to/your_config.yaml LFM_CHECKPOINT=path/to/lfm2_5_350m_ft.pt

Fine-tuning your own

To adapt this model to a new domain (medical, legal, multilingual dictation), follow the Unsloth LFM2.5 fine-tuning tutorial. The tutorial covers SFT + LoRA, hyperparameter selection, and export.

The training data for this model was a mix of:

~1,350 synthetic dictation pairs (clean target → noisified spoken input via filler/disfluency injection + casing distortion).
~706 dictation-style turns extracted from the AMI Meeting Corpus.

The synthetic pipeline and AMI extraction code are not yet open-sourced; the eval splits in eval/ are the publicly verifiable artifacts.

Limitations

Self-corrections — over-summarizes "actually no — make it tomorrow" patterns; sometimes drops the corrected clause.
Email sign-offs — occasionally drops the closing name in template-style sign-offs ("Best, Younghan" → "Best,").
Long context — the in-app pipeline chunks transcripts longer than ~30 words. Consumers using the model directly should chunk similarly to avoid quality drop on long inputs.
English only — trained on English dictation; behavior on other languages is undefined.
Not a chat model — will refuse / ignore questions, by design.

License & acknowledgements

This derivative inherits the LiquidAI/LFM2.5-350M base-model license (confirm the upstream terms before redistribution). Eval data is derived from the AMI Meeting Corpus (CC-BY-4.0).

Thanks to:

LiquidAI — for releasing LFM2.5-350M and the LFM architecture.
Apple MLX team — for mlx and the MLX delegate inside ExecuTorch.
PyTorch / ExecuTorch team — for the runtime and the export pipeline.
University of Edinburgh and the AMI corpus contributors — for the dictation eval source.
Unsloth — for the fine-tuning recipe.

Citation

@software{execuwhisper_formatter2026,
  title = {LFM2.5-350M ExecuWhisper Formatter},
  author = {YoungHan(SeyeongHan)},
  year = {2026},
  url = {https://huggingface.co/younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter},
  note = {Fine-tuned LFM2.5-350M dictation cleaner; 4-bit MLX quantization for Apple Silicon}
}

Companion projects

ExecuWhisper macOS app — the consuming dictation app.
pytorch/executorch — runtime, MLX delegate, export pipeline.
Unsloth LFM2.5 tutorial — recommended fine-tuning path.

Downloads last month: 8

Model tree for younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter

Base model

LiquidAI/LFM2.5-350M-Base

Finetuned

LiquidAI/LFM2.5-350M

Quantized

(33)

this model

younghan-meta
/

LFM2.5-350M-ExecuWhisper-Formatter

LFM2.5-350M — ExecuWhisper Formatter

What this is (and isn't)

Files

Quick Start

As a `.pte` running on the ExecuTorch MLX delegate (the same path the app uses)

From the fp32 checkpoint via `transformers`

Eval results

AMI release gate (4-bit quantized `.pte`)

Comparison vs. earlier v2.1 baseline

Re-exporting / re-quantizing

Fine-tuning your own

Limitations

License & acknowledgements

Citation

Companion projects

Model tree for younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter

Dataset used to train younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter

LFM2.5-350M — ExecuWhisper Formatter

What this is (and isn't)

Files

Quick Start

As a .pte running on the ExecuTorch MLX delegate (the same path the app uses)

From the fp32 checkpoint via transformers

Eval results

AMI release gate (4-bit quantized .pte)

Comparison vs. earlier v2.1 baseline

Re-exporting / re-quantizing

Fine-tuning your own

Limitations

License & acknowledgements

Citation

Companion projects

Model tree for younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter

Dataset used to train younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter

As a `.pte` running on the ExecuTorch MLX delegate (the same path the app uses)

From the fp32 checkpoint via `transformers`

AMI release gate (4-bit quantized `.pte`)