whispermath-webdemo / README.md
vibhuiitj's picture
Expand README with setup and deployment steps
27f52a8 verified
---
title: WhisperMath
emoji: 🧮
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---
# WhisperMath Web Demo
WhisperMath is an interactive demo for converting spoken mathematical phrases into rendered math notation.
```text
browser microphone
-> faster-whisper transcript
-> ByT5 math decoder
-> rendered KaTeX output
```
The demo has two useful modes:
- **Record audio**: speak a math expression in the browser.
- **Edit transcript**: correct Whisper's transcript and click **Decode Transcript** to test only the ByT5 decoder.
This separation is important because spoken-math errors can come from two different places:
- Whisper may hear the audio incorrectly.
- The ByT5 decoder may convert a correct transcript incorrectly.
## Live Demo
Hugging Face Space:
```text
https://huggingface.co/spaces/vibhuiitj/whispermath-webdemo
```
Direct app URL:
```text
https://vibhuiitj-whispermath-webdemo.hf.space
```
The public Space is configured for free CPU:
```text
Whisper: small.en
Decoder: vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
Device: CPU
```
## Files
```text
webdemo/
app.py FastAPI backend
static/index.html Browser UI with recorder and KaTeX rendering
requirements.txt Python dependencies
Dockerfile Hugging Face Spaces Docker image
.dockerignore Files ignored by Docker build
README.md Space metadata and this guide
```
## How It Works
1. The browser records audio using `MediaRecorder`.
2. The frontend uploads the recording to `POST /api/transcribe`.
3. The backend saves the audio to a temporary file.
4. `faster-whisper` transcribes the audio into English text.
5. The transcript is passed to the ByT5 checkpoint.
6. The ByT5 output is returned as raw math/LaTeX-like text.
7. The frontend renders the output using KaTeX and also shows the raw model output for debugging.
## Local Setup
From this folder:
```bash
cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo
```
You can use the existing Phase 3 virtualenv:
```bash
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python -m pip install -r requirements.txt
```
Or create a fresh virtualenv:
```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
```
## Run Locally
CPU-friendly default:
```bash
uvicorn app:app --host 127.0.0.1 --port 8766
```
Then open:
```text
http://127.0.0.1:8766
```
If using the Phase 3 virtualenv directly:
```bash
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
-m uvicorn app:app --host 127.0.0.1 --port 8766
```
## Model Configuration
The app is controlled with environment variables.
```bash
export WHISPERMATH_WHISPER_MODEL=small.en
export WHISPERMATH_WHISPER_DEVICE=cpu
export WHISPERMATH_WHISPER_COMPUTE_TYPE=int8
export WHISPERMATH_DECODER_MODEL=vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
export WHISPERMATH_DECODER_DEVICE=auto
```
Whisper model options:
```text
tiny.en Fastest, weakest transcription
base.en Better quality, still fairly light
small.en Good CPU default for the public Space
medium.en Better transcription, slower and heavier
```
For local testing with medium Whisper:
```bash
WHISPERMATH_WHISPER_MODEL=medium.en \
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
-m uvicorn app:app --host 127.0.0.1 --port 8766
```
For the free Hugging Face CPU Space, keep:
```bash
WHISPERMATH_WHISPER_MODEL=small.en
WHISPERMATH_DECODER_DEVICE=cpu
```
## API Endpoints
### Health
```bash
curl http://127.0.0.1:8766/api/health
```
Example:
```json
{
"status": "ok",
"whisper_model": "small.en",
"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724",
"decoder_device": "cpu"
}
```
### Decode Text Only
Use this when you want to test the ByT5 decoder without audio:
```bash
curl -X POST http://127.0.0.1:8766/api/decode \
-H "Content-Type: application/json" \
-d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
```
Example response:
```json
{
"transcript": "x squared minus y squared equals four",
"math_text": "x^2-y^2=4",
"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```
### Transcribe Audio
```bash
curl -X POST http://127.0.0.1:8766/api/transcribe \
-F audio=@/path/to/audio.wav \
-F num_beams=4 \
-F max_new_tokens=256
```
Example response:
```json
{
"transcript": "integral from zero to pi of sine x dx.",
"math_text": "\\int_0^\\pi \\sin x dx.",
"segments": [
{
"start": 0.0,
"end": 2.72,
"text": "integral from zero to pi of sine x dx."
}
],
"whisper_model": "small.en",
"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```
## Deploy To Hugging Face Spaces
This folder is ready for a Docker Space.
### 1. Login
Do not commit or paste tokens into files.
```bash
huggingface-cli login
```
Or use the Python API with your local authenticated session.
### 2. Create The Space
```python
from huggingface_hub import HfApi
api = HfApi()
api.create_repo(
repo_id="vibhuiitj/whispermath-webdemo",
repo_type="space",
space_sdk="docker",
private=False,
exist_ok=True,
)
```
### 3. Upload The Folder
```python
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
repo_id="vibhuiitj/whispermath-webdemo",
repo_type="space",
folder_path="/Users/vaibhav/Desktop/beyond/whispermath/webdemo",
path_in_repo=".",
ignore_patterns=[
"__pycache__/*",
"*.pyc",
".DS_Store",
".venv/*",
"audio/*",
"outputs/*",
],
commit_message="Deploy WhisperMath web demo",
)
```
### 4. Check Runtime
```python
from huggingface_hub import HfApi
runtime = HfApi().get_space_runtime("vibhuiitj/whispermath-webdemo")
print(runtime)
```
Expected final state:
```text
stage='RUNNING'
hardware='cpu-basic'
requested_hardware='cpu-basic'
```
### 5. Test The Deployed Space
Health:
```bash
curl https://vibhuiitj-whispermath-webdemo.hf.space/api/health
```
Text decode:
```bash
curl -L -X POST https://vibhuiitj-whispermath-webdemo.hf.space/api/decode \
-H "Content-Type: application/json" \
-d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
```
Expected:
```json
{
"transcript": "x squared minus y squared equals four",
"math_text": "x^2-y^2=4",
"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```
## Docker Notes
The Docker image:
- Uses `python:3.11-slim`
- Installs `libgomp1`, needed by some CPU inference dependencies
- Runs on port `7860`, the Hugging Face Spaces default
- Sets `WHISPERMATH_WHISPER_MODEL=small.en`
- Sets `WHISPERMATH_DECODER_DEVICE=cpu`
- Sets `HF_HUB_DISABLE_XET=1`
`HF_HUB_DISABLE_XET=1` is included because local testing showed large model downloads could get stuck with incomplete Xet-backed cache files.
## Troubleshooting
### Space Takes A Long Time To Start
The first start downloads:
- `Systran/faster-whisper-small.en`
- `vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724`
On free CPU, startup can take a few minutes.
### Audio Transcription Is Wrong
Try these in order:
1. Speak shorter phrases.
2. Use clearer operator words, for example `over` instead of `by`.
3. Check the editable transcript box.
4. Correct the transcript manually.
5. Click **Decode Transcript**.
6. Try `base.en`, `small.en`, or `medium.en` locally.
### ByT5 Output Is Wrong But Transcript Is Correct
That means the decoder needs more targeted training data. Common weak phrases include:
```text
by / divided by / over
whole square
derivative of ...
limit as ...
fraction with grouped numerator and denominator
```
Use the editable transcript box to collect failure cases.
### KaTeX Rendering Fails
The app still shows the raw ByT5 output under **Raw ByT5 Output**. If the raw output is malformed LaTeX-like text, KaTeX may render an error-colored expression or show fallback text.
### Medium Whisper Hangs During Download
If a local download leaves an incomplete cache file, run:
```bash
HF_HUB_DISABLE_XET=1 python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download("Systran/faster-whisper-medium.en", max_workers=1)
PY
```
Then restart:
```bash
HF_HUB_DISABLE_XET=1 WHISPERMATH_WHISPER_MODEL=medium.en \
python -m uvicorn app:app --host 127.0.0.1 --port 8766
```
## Security
Never commit Hugging Face tokens or API keys into this folder.
If a token is pasted into a chat or terminal history by mistake, revoke/rotate it from Hugging Face settings.