How to Post-Process Your Realtime Speech-to-Text Transcripts Locally with LM Studio

Community Article Published April 6, 2026

banner

Prerequisites

For the purpose of this tutorial, we would be using an open-source dictation tool, Handy.

1. Install LM Studio

Download and install LM Studio from the official website.

2. Download a Post-Processing Model of Your Choice

For this tutorial, we will be downloading a post-processing model from the Hugging Face Model Hub via the LM Studio UI.

Launch LM Studio and click on the Model Search icon in the left sidebar. Search for bingbangboom/Qwen3006B-transcriber-beta and choose the appropriate quantization variant for your system. Click the Download button to download the model to your local machine. For the purpose of this tutorial, we will be using the Q8_0 model variant.

1

3. Configure the Post-Processing Model

After the model has finished downloading, click on the My Models icon in the left sidebar and then click on the gear icon next to the model you just downloaded to edit its default config. A side panel will open on the right with the model's configuration options. This default config will be used automatically every time you load the post-processing model.

In this panel, click on Load and set the following options:

Context Length: 2048
Evaluation Batch Size: 1024
Max Concurrency: 1

Note: The above settings are recommended for most models. Adjust as needed for your model and system. Hover over any setting in LM Studio to see a tooltip with more information about that setting.

Next, click on Inference in the panel, next to the Load tab. According to the model card for this specific model, the System Prompt should be left empty (this may not be the case for other models). The other settings are as follows:

Temperature: 0.1
Top K Sampling: 10
Top P Sampling: 0.95
Min P Sampling: 0.05
Repeat Penalty: 1.0

Additionally, the model page recommends disabling thinking for this Qwen3-based model to cut down latency. To do this, scroll down to Prompt Template, click on Template (Jinja), and then add the following at the top of the template:

{%- set enable_thinking = false %}
2 3
Load Tab Inference Tab

4. Serve the Post-Processing Model

Note: It is recommended to enable Developer mode in LM Studio to reveal more controls and log more details about the model. To do this, click the Settings gear in the left sidebar, click on Developer, and toggle the switch to enable it.

To serve the post-processing model, click on the Developer icon in the left sidebar and then click on Load Model. Select the model you just configured, verify the settings (context length, evaluation batch size, and max concurrency), and click the Load Model button. Once loaded, click on the little Show Sidebar icon in the top right corner to reveal the settings panel for the loaded model and verify that all loading and inference settings are same as previously configured in Step 3. Note that any changes made here will now override our default config.

Finally, toggle the server switch to start serving requests. You should see the model status change to Running — the model is now ready to serve requests.

4

5. Configure Handy to Use the Post-Processing Model

To configure Handy to use the post-processing model, first go to the Advanced tab in the left sidebar and turn on Experimental Features. Then scroll down to the newly revealed experimental features and turn on Post Processing. A new Post Process tab will appear in the left sidebar. Click on it.

In the Post Process settings, you can choose a Post Processing Hotkey to trigger the post-processing model when a response is generated by Handy. Simply click on the currently configured hotkey and type the desired hotkey combination to change it.

Note: If LM Studio is not running or the model is not being served correctly, pressing the hotkey will fall back to the default non-processed speech-to-text behavior.

Then configure the rest of the settings as recommended below:

Provider: Custom
Base URL: http://localhost:1234/v1 (default)
Model: qwen3006b-transcriber-beta (select from dropdown)

In the Prompt section, click on Create New Prompt to customize the prompt that will be sent to the post-processing model. Enter a Prompt Label to easily identify this prompt in the future. For this tutorial, we are setting it to qwen3006b-transcriber-beta to keep track of which prompt is being used for this model.

Then set the Prompt Instructions to:

${output}

5

Congratulations! You have successfully configured a post-processing model for Handy.

Note: The contents of Prompt Instructions are what the post-processing model will receive as user input. You can see it appear as a User request in the Developer Logs in LM Studio. The ${output} variable is a placeholder that will be replaced with the raw transcript text generated by Handy. Since this particular post-processing model was fine-tuned directly on raw transcript text, it will perform well on the direct transcript provided by Handy. Some other models may require more elaborate prompt instructions prepended. For example, the default Improve Transcription prompt will prepend those specified instructions to the transcript text before sending it to the configured generic post-processing model. Refer to the model documentation for the correct prompt instructions.

6. Using the Post-Processing Feature

Press the configured post-processing hotkey to process your speech using the post-processing model. The processed transcript will appear in the transcript window. You can also review the post-processing results in the LM Studio Developer Logs.

Note: Some errors in the post-processing results can be expected, as they may stem from the limitations of either the primary speech recoginition model or the post-processing model itself. Verify the generated transcripts carefully. In case of any issues, review the logs for more details and report the problem to the model author.

Community

Sign up or log in to comment