Spaces:

vibhuiitj
/

whispermath-webdemo

Sleeping

File size: 8,688 Bytes

---
title: WhisperMath
emoji: 🧮
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---

# WhisperMath Web Demo

WhisperMath is an interactive demo for converting spoken mathematical phrases into rendered math notation.

```text
browser microphone
  -> faster-whisper transcript
  -> ByT5 math decoder
  -> rendered KaTeX output
```

The demo has two useful modes:

- **Record audio**: speak a math expression in the browser.
- **Edit transcript**: correct Whisper's transcript and click **Decode Transcript** to test only the ByT5 decoder.

This separation is important because spoken-math errors can come from two different places:

- Whisper may hear the audio incorrectly.
- The ByT5 decoder may convert a correct transcript incorrectly.

## Live Demo

Hugging Face Space:

```text
https://huggingface.co/spaces/vibhuiitj/whispermath-webdemo
```

Direct app URL:

```text
https://vibhuiitj-whispermath-webdemo.hf.space
```

The public Space is configured for free CPU:

```text
Whisper: small.en
Decoder: vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
Device: CPU
```

## Files

```text
webdemo/
  app.py              FastAPI backend
  static/index.html   Browser UI with recorder and KaTeX rendering
  requirements.txt    Python dependencies
  Dockerfile          Hugging Face Spaces Docker image
  .dockerignore       Files ignored by Docker build
  README.md           Space metadata and this guide
```

## How It Works

1. The browser records audio using `MediaRecorder`.
2. The frontend uploads the recording to `POST /api/transcribe`.
3. The backend saves the audio to a temporary file.
4. `faster-whisper` transcribes the audio into English text.
5. The transcript is passed to the ByT5 checkpoint.
6. The ByT5 output is returned as raw math/LaTeX-like text.
7. The frontend renders the output using KaTeX and also shows the raw model output for debugging.

## Local Setup

From this folder:

```bash
cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo
```

You can use the existing Phase 3 virtualenv:

```bash
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python -m pip install -r requirements.txt
```

Or create a fresh virtualenv:

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
```

## Run Locally

CPU-friendly default:

```bash
uvicorn app:app --host 127.0.0.1 --port 8766
```

Then open:

```text
http://127.0.0.1:8766
```

If using the Phase 3 virtualenv directly:

```bash
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
  -m uvicorn app:app --host 127.0.0.1 --port 8766
```

## Model Configuration

The app is controlled with environment variables.

```bash
export WHISPERMATH_WHISPER_MODEL=small.en
export WHISPERMATH_WHISPER_DEVICE=cpu
export WHISPERMATH_WHISPER_COMPUTE_TYPE=int8
export WHISPERMATH_DECODER_MODEL=vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
export WHISPERMATH_DECODER_DEVICE=auto
```

Whisper model options:

```text
tiny.en    Fastest, weakest transcription
base.en    Better quality, still fairly light
small.en   Good CPU default for the public Space
medium.en  Better transcription, slower and heavier
```

For local testing with medium Whisper:

```bash
WHISPERMATH_WHISPER_MODEL=medium.en \
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
  -m uvicorn app:app --host 127.0.0.1 --port 8766
```

For the free Hugging Face CPU Space, keep:

```bash
WHISPERMATH_WHISPER_MODEL=small.en
WHISPERMATH_DECODER_DEVICE=cpu
```

## API Endpoints

### Health

```bash
curl http://127.0.0.1:8766/api/health
```

Example:

```json
{
  "status": "ok",
  "whisper_model": "small.en",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724",
  "decoder_device": "cpu"
}
```

### Decode Text Only

Use this when you want to test the ByT5 decoder without audio:

```bash
curl -X POST http://127.0.0.1:8766/api/decode \
  -H "Content-Type: application/json" \
  -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
```

Example response:

```json
{
  "transcript": "x squared minus y squared equals four",
  "math_text": "x^2-y^2=4",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```

### Transcribe Audio

```bash
curl -X POST http://127.0.0.1:8766/api/transcribe \
  -F audio=@/path/to/audio.wav \
  -F num_beams=4 \
  -F max_new_tokens=256
```

Example response:

```json
{
  "transcript": "integral from zero to pi of sine x dx.",
  "math_text": "\\int_0^\\pi \\sin x dx.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.72,
      "text": "integral from zero to pi of sine x dx."
    }
  ],
  "whisper_model": "small.en",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```

## Deploy To Hugging Face Spaces

This folder is ready for a Docker Space.

### 1. Login

Do not commit or paste tokens into files.

```bash
huggingface-cli login
```

Or use the Python API with your local authenticated session.

### 2. Create The Space

```python
from huggingface_hub import HfApi

api = HfApi()
api.create_repo(
    repo_id="vibhuiitj/whispermath-webdemo",
    repo_type="space",
    space_sdk="docker",
    private=False,
    exist_ok=True,
)
```

### 3. Upload The Folder

```python
from huggingface_hub import HfApi

api = HfApi()
api.upload_folder(
    repo_id="vibhuiitj/whispermath-webdemo",
    repo_type="space",
    folder_path="/Users/vaibhav/Desktop/beyond/whispermath/webdemo",
    path_in_repo=".",
    ignore_patterns=[
        "__pycache__/*",
        "*.pyc",
        ".DS_Store",
        ".venv/*",
        "audio/*",
        "outputs/*",
    ],
    commit_message="Deploy WhisperMath web demo",
)
```

### 4. Check Runtime

```python
from huggingface_hub import HfApi

runtime = HfApi().get_space_runtime("vibhuiitj/whispermath-webdemo")
print(runtime)
```

Expected final state:

```text
stage='RUNNING'
hardware='cpu-basic'
requested_hardware='cpu-basic'
```

### 5. Test The Deployed Space

Health:

```bash
curl https://vibhuiitj-whispermath-webdemo.hf.space/api/health
```

Text decode:

```bash
curl -L -X POST https://vibhuiitj-whispermath-webdemo.hf.space/api/decode \
  -H "Content-Type: application/json" \
  -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
```

Expected:

```json
{
  "transcript": "x squared minus y squared equals four",
  "math_text": "x^2-y^2=4",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```

## Docker Notes

The Docker image:

- Uses `python:3.11-slim`
- Installs `libgomp1`, needed by some CPU inference dependencies
- Runs on port `7860`, the Hugging Face Spaces default
- Sets `WHISPERMATH_WHISPER_MODEL=small.en`
- Sets `WHISPERMATH_DECODER_DEVICE=cpu`
- Sets `HF_HUB_DISABLE_XET=1`

`HF_HUB_DISABLE_XET=1` is included because local testing showed large model downloads could get stuck with incomplete Xet-backed cache files.

## Troubleshooting

### Space Takes A Long Time To Start

The first start downloads:

- `Systran/faster-whisper-small.en`
- `vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724`

On free CPU, startup can take a few minutes.

### Audio Transcription Is Wrong

Try these in order:

1. Speak shorter phrases.
2. Use clearer operator words, for example `over` instead of `by`.
3. Check the editable transcript box.
4. Correct the transcript manually.
5. Click **Decode Transcript**.
6. Try `base.en`, `small.en`, or `medium.en` locally.

### ByT5 Output Is Wrong But Transcript Is Correct

That means the decoder needs more targeted training data. Common weak phrases include:

```text
by / divided by / over
whole square
derivative of ...
limit as ...
fraction with grouped numerator and denominator
```

Use the editable transcript box to collect failure cases.

### KaTeX Rendering Fails

The app still shows the raw ByT5 output under **Raw ByT5 Output**. If the raw output is malformed LaTeX-like text, KaTeX may render an error-colored expression or show fallback text.

### Medium Whisper Hangs During Download

If a local download leaves an incomplete cache file, run:

```bash
HF_HUB_DISABLE_XET=1 python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download("Systran/faster-whisper-medium.en", max_workers=1)
PY
```

Then restart:

```bash
HF_HUB_DISABLE_XET=1 WHISPERMATH_WHISPER_MODEL=medium.en \
python -m uvicorn app:app --host 127.0.0.1 --port 8766
```

## Security

Never commit Hugging Face tokens or API keys into this folder.

If a token is pasted into a chat or terminal history by mistake, revoke/rotate it from Hugging Face settings.