Papers
arxiv:2507.01021

Scalable Offline ASR for Command-Style Dictation in Courtrooms

Published on Jun 7, 2025
Authors:
,
,
,

Abstract

An open-source framework for command-style dictation that uses Voice Activity Detection and parallel Whisper model transcription to reduce latency and improve compute utilization in real-time applications.

AI-generated summary

We propose an open-source framework for Command-style dictation that addresses the gap between resource-intensive Online systems and high-latency Batch processing. Our approach uses Voice Activity Detection (VAD) to segment audio and transcribes these segments in parallel using Whisper models, enabling efficient multiplexing across audios. Unlike proprietary systems like SuperWhisper, this framework is also compatible with most ASR architectures, including widely used CTC-based models. Our multiplexing technique maximizes compute utilization in real-world settings, as demonstrated by its deployment in around 15% of India's courtrooms. Evaluations on live data show consistent latency reduction as user concurrency increases, compared to sequential batch processing. The live demonstration will showcase our open-sourced implementation and allow attendees to interact with it in real-time.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2507.01021
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.01021 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.01021 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.01021 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.