lilfugu

A Japanese ASR model fine-tuned for software development.

Based on Qwen3-ASR-1.7B. Designed to produce clean, usable transcriptions for developers β€” not just programming term recognition, but also proper Arabic numerals (e.g. 3000, not 三千), consistent punctuation, and overall higher-quality Japanese output.

What's improved over the base model

  • Programming terms in English: useEffect, Docker, Vercel, Prisma, Tailwind CSS, etc. β€” not katakana
  • Arabic numerals: 3000η•ͺγƒγƒΌγƒˆ, 200ms, 8GB β€” not kanji numerals
  • Punctuation and formatting: cleaner, more consistent output
  • General Japanese quality: improvements not fully captured by existing benchmarks (JSUT, etc.) due to their normalization

Benchmarks

ADLIB

Model CER Term Accuracy (Exact) Composite
lilfugu 26.3% 51.6% 0.6272
Qwen3-ASR-1.7B (base) 41.1% 24.6% 0.4203
Whisper large-v3-turbo 41.9% 20.2% 0.3935
kotoba-whisper-v2.0 61.1% 7.0% 0.2256
SenseVoice Small 56.8% 0.0% 0.2090

Composite = 0.4 Γ— (1 - CER) + 0.6 Γ— Term Accuracy (includes both exact and flexible matches)

Benchmark: ADLIB β€” Language-aware ASR benchmark for Japanese

JSUT

Model CER
Qwen3-ASR-1.7B (base) 10.7%
lilfugu 10.8%
Whisper large-v3-turbo 12.0%
kotoba-whisper-v2.0 15.7%
SenseVoice Small 16.2%

Dataset: JSUT

Note: Existing Japanese ASR benchmarks are not designed to properly evaluate Japanese language quality β€” they normalize numbers, punctuation, and whitespace before scoring. These scores should be taken as a rough reference only.

Variants

Repository Size Format
lilfugu (this) 4.1 GB MLX bfloat16
lilfugu-8bit 2.8 GB MLX 8bit quantized
lilfugu-transformers 4.1 GB safetensors fp16 (CUDA/Linux)
lilfugu-lora ~49 MB LoRA adapter

See also: lilfugu-experimental β€” higher term accuracy, but may over-convert in some cases.

Usage

MLX (Apple Silicon)

pip install -U mlx-audio
from mlx_audio.stt import load

model = load("holotherapper/lilfugu")
result = model.generate("audio.wav", language="Japanese")
print(result.text)

For the 8bit version:

model = load("holotherapper/lilfugu-8bit")

CUDA / Linux

from qwen_asr import Qwen3ASR

model = Qwen3ASR.from_pretrained("holotherapper/lilfugu-transformers")
result = model.transcribe("audio.wav")

LoRA adapter (custom scale tuning)

from mlx_tune.stt import FastSTTModel
from mlx_lm.tuner.lora import LoRALinear

model, _ = FastSTTModel.from_pretrained("mlx-community/Qwen3-ASR-1.7B-bf16")
model.load_adapter("holotherapper/lilfugu-lora")

# Adjust scale (0.0-1.0). Higher = stronger term conversion.
for _, module in model.model.named_modules():
    if isinstance(module, LoRALinear):
        module.scale = 1.0

text = model.transcribe("audio.wav", language="ja")

License

Apache 2.0 (following Qwen3-ASR-1.7B)

Downloads last month
84
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for holotherapper/lilfugu

Finetuned
(32)
this model
Quantizations
1 model