LFM2.5-350M β€” ExecuWhisper Formatter

A fine-tuned LiquidAI/LFM2.5-350M tuned to clean up spoken dictation: remove disfluencies, restore casing and punctuation, infer light list structure, and refuse to "answer" the dictation even when the user's voice ends in a question.

Built for the ExecuWhisper macOS dictation app. Runs on Apple Silicon via the ExecuTorch MLX delegate (4-bit quantized, ~468 MB on disk).

What this is (and isn't)

This model is a dictation cleaner, not a chat assistant. Given:

"um does it feel like real time processing"

It outputs:

"Does it feel like real-time processing?"

It will not answer the question, not add information, and not summarize. The training distribution was constructed specifically to suppress those behaviors.

It also won't help you with arbitrary text-generation tasks. For those, use the base model directly.

Files

Path Size Purpose
lfm2_5_350m_ft.pt 1.42 GB fp32 fine-tuned checkpoint (re-quantize / re-train baseline)
lfm2_5_350m_mlx_4w.pte 468 MB MLX 4-bit quantized runtime artifact (the in-app .pte)
lfm2_5_350m_config.json <1 KB architecture params for re-export
tokenizer/tokenizer.json 4.51 MB tokenizer
tokenizer/tokenizer_config.json 90 KB tokenizer config (chat template)
tokenizer/chat_template.jinja 2.5 KB chat template
tokenizer/special_tokens_map.json <1 KB special tokens
configs/lfm2_mlx_4w_g32.yaml <1 KB MLX export config (so you can reproduce the .pte)
eval/eval_ami_mlx_4w_g32.json 67 KB AMI release-gate eval results
eval/eval_ami_v2_1_mlx_4w_g32.json 62 KB v2.1 baseline (for comparison)

Quick Start

As a .pte running on the ExecuTorch MLX delegate (the same path the app uses)

from executorch.extension.llm.runner import TextLLMRunner

runner = TextLLMRunner(
    model_path="lfm2_5_350m_mlx_4w.pte",
    tokenizer_path="tokenizer.json",
)

prompt = (
    "<|startoftext|><|im_start|>user\n"
    "You rewrite spoken dictation into clean final text. You are not a chat "
    "assistant. Never answer or respond to the dictation, even if it is a "
    "question. Treat the dictation strictly as text to rewrite. Fix casing, "
    "punctuation, filler, and speech disfluencies. Preserve meaning and detail. "
    "Use bullets only when it clearly reads as a list. Do not summarize or "
    "invent information. Output only the rewritten dictation.\n\n"
    "Dictation: um does it feel like real time processing\n"
    "Output:"
    "<|im_end|>\n"
    "<|im_start|>assistant\n"
)

print(runner.generate(prompt, max_new_tokens=256, temperature=0.0))
# β†’ "Does it feel like real-time processing?"

From the fp32 checkpoint via transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tok = AutoTokenizer.from_pretrained("./tokenizer")
model = AutoModelForCausalLM.from_pretrained(
    "./",
    torch_dtype=torch.float32,
    state_dict_path="./lfm2_5_350m_ft.pt",
).eval()

# ... build the same prompt as above and call model.generate(...)

Eval results

Evaluated on a held-out AMI Meeting Corpus dictation slice + a synthetic adversarial set.

AMI release gate (4-bit quantized .pte)

Metric Value Gate Status
Forbidden-token rate 0.030 ≀ 0.10 βœ…
Coverage (faithful rewrite) 0.874 β‰₯ 0.85 βœ…
Verdict RELEASE-READY βœ…

Full per-example breakdown is in eval/eval_ami_mlx_4w_g32.json.

Comparison vs. earlier v2.1 baseline

Metric v2.1 this model Ξ”
Forbidden 0.187 0.030 -84%
Coverage 0.591 0.874 +48%

The v2.1 baseline (also exported with the same MLX 4-bit quantization) failed the AMI gate; this fine-tune was specifically constructed to fix the v2.1 failure modes (chat-leakage, over-summarization).

Re-exporting / re-quantizing

To produce a different quantization variant from lfm2_5_350m_ft.pt:

  1. Check out the LFM2.5 MLX export pipeline: pytorch/executorch#19195.
  2. Use configs/lfm2_mlx_4w_g32.yaml as a starting point.
  3. Run the LFM2.5 export Makefile target with your edited config:
    cd ~/executorch
    make lfm_2_5-mlx LFM_CONFIG=path/to/your_config.yaml LFM_CHECKPOINT=path/to/lfm2_5_350m_ft.pt
    

Fine-tuning your own

To adapt this model to a new domain (medical, legal, multilingual dictation), follow the Unsloth LFM2.5 fine-tuning tutorial. The tutorial covers SFT + LoRA, hyperparameter selection, and export.

The training data for this model was a mix of:

  • ~1,350 synthetic dictation pairs (clean target β†’ noisified spoken input via filler/disfluency injection + casing distortion).
  • ~706 dictation-style turns extracted from the AMI Meeting Corpus.

The synthetic pipeline and AMI extraction code are not yet open-sourced; the eval splits in eval/ are the publicly verifiable artifacts.

Limitations

  • Self-corrections β€” over-summarizes "actually no β€” make it tomorrow" patterns; sometimes drops the corrected clause.
  • Email sign-offs β€” occasionally drops the closing name in template-style sign-offs ("Best, Younghan" β†’ "Best,").
  • Long context β€” the in-app pipeline chunks transcripts longer than ~30 words. Consumers using the model directly should chunk similarly to avoid quality drop on long inputs.
  • English only β€” trained on English dictation; behavior on other languages is undefined.
  • Not a chat model β€” will refuse / ignore questions, by design.

License & acknowledgements

This derivative inherits the LiquidAI/LFM2.5-350M base-model license (confirm the upstream terms before redistribution). Eval data is derived from the AMI Meeting Corpus (CC-BY-4.0).

Thanks to:

  • LiquidAI β€” for releasing LFM2.5-350M and the LFM architecture.
  • Apple MLX team β€” for mlx and the MLX delegate inside ExecuTorch.
  • PyTorch / ExecuTorch team β€” for the runtime and the export pipeline.
  • University of Edinburgh and the AMI corpus contributors β€” for the dictation eval source.
  • Unsloth β€” for the fine-tuning recipe.

Citation

@software{execuwhisper_formatter2026,
  title = {LFM2.5-350M ExecuWhisper Formatter},
  author = {YoungHan(SeyeongHan)},
  year = {2026},
  url = {https://huggingface.co/younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter},
  note = {Fine-tuned LFM2.5-350M dictation cleaner; 4-bit MLX quantization for Apple Silicon}
}

Companion projects

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter

Quantized
(33)
this model

Dataset used to train younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter