File size: 8,688 Bytes
ae942bf
95c3887
 
 
 
ae942bf
95c3887
ae942bf
 
 
95c3887
 
27f52a8
95c3887
 
27f52a8
 
 
 
95c3887
 
27f52a8
95c3887
27f52a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95c3887
 
 
27f52a8
 
 
 
 
 
 
 
 
 
 
95c3887
 
 
 
 
 
27f52a8
 
 
95c3887
 
 
 
 
 
 
 
 
 
 
27f52a8
 
 
 
 
 
 
 
 
 
95c3887
 
 
 
 
 
 
 
 
27f52a8
 
 
 
 
 
 
 
 
 
95c3887
 
27f52a8
 
 
95c3887
 
27f52a8
95c3887
 
27f52a8
 
95c3887
 
27f52a8
 
 
95c3887
 
27f52a8
95c3887
 
27f52a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95c3887
 
27f52a8
 
 
 
95c3887
27f52a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
---
title: WhisperMath
emoji: 🧮
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---

# WhisperMath Web Demo

WhisperMath is an interactive demo for converting spoken mathematical phrases into rendered math notation.

```text
browser microphone
  -> faster-whisper transcript
  -> ByT5 math decoder
  -> rendered KaTeX output
```

The demo has two useful modes:

- **Record audio**: speak a math expression in the browser.
- **Edit transcript**: correct Whisper's transcript and click **Decode Transcript** to test only the ByT5 decoder.

This separation is important because spoken-math errors can come from two different places:

- Whisper may hear the audio incorrectly.
- The ByT5 decoder may convert a correct transcript incorrectly.

## Live Demo

Hugging Face Space:

```text
https://huggingface.co/spaces/vibhuiitj/whispermath-webdemo
```

Direct app URL:

```text
https://vibhuiitj-whispermath-webdemo.hf.space
```

The public Space is configured for free CPU:

```text
Whisper: small.en
Decoder: vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
Device: CPU
```

## Files

```text
webdemo/
  app.py              FastAPI backend
  static/index.html   Browser UI with recorder and KaTeX rendering
  requirements.txt    Python dependencies
  Dockerfile          Hugging Face Spaces Docker image
  .dockerignore       Files ignored by Docker build
  README.md           Space metadata and this guide
```

## How It Works

1. The browser records audio using `MediaRecorder`.
2. The frontend uploads the recording to `POST /api/transcribe`.
3. The backend saves the audio to a temporary file.
4. `faster-whisper` transcribes the audio into English text.
5. The transcript is passed to the ByT5 checkpoint.
6. The ByT5 output is returned as raw math/LaTeX-like text.
7. The frontend renders the output using KaTeX and also shows the raw model output for debugging.

## Local Setup

From this folder:

```bash
cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo
```

You can use the existing Phase 3 virtualenv:

```bash
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python -m pip install -r requirements.txt
```

Or create a fresh virtualenv:

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
```

## Run Locally

CPU-friendly default:

```bash
uvicorn app:app --host 127.0.0.1 --port 8766
```

Then open:

```text
http://127.0.0.1:8766
```

If using the Phase 3 virtualenv directly:

```bash
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
  -m uvicorn app:app --host 127.0.0.1 --port 8766
```

## Model Configuration

The app is controlled with environment variables.

```bash
export WHISPERMATH_WHISPER_MODEL=small.en
export WHISPERMATH_WHISPER_DEVICE=cpu
export WHISPERMATH_WHISPER_COMPUTE_TYPE=int8
export WHISPERMATH_DECODER_MODEL=vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
export WHISPERMATH_DECODER_DEVICE=auto
```

Whisper model options:

```text
tiny.en    Fastest, weakest transcription
base.en    Better quality, still fairly light
small.en   Good CPU default for the public Space
medium.en  Better transcription, slower and heavier
```

For local testing with medium Whisper:

```bash
WHISPERMATH_WHISPER_MODEL=medium.en \
/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
  -m uvicorn app:app --host 127.0.0.1 --port 8766
```

For the free Hugging Face CPU Space, keep:

```bash
WHISPERMATH_WHISPER_MODEL=small.en
WHISPERMATH_DECODER_DEVICE=cpu
```

## API Endpoints

### Health

```bash
curl http://127.0.0.1:8766/api/health
```

Example:

```json
{
  "status": "ok",
  "whisper_model": "small.en",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724",
  "decoder_device": "cpu"
}
```

### Decode Text Only

Use this when you want to test the ByT5 decoder without audio:

```bash
curl -X POST http://127.0.0.1:8766/api/decode \
  -H "Content-Type: application/json" \
  -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
```

Example response:

```json
{
  "transcript": "x squared minus y squared equals four",
  "math_text": "x^2-y^2=4",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```

### Transcribe Audio

```bash
curl -X POST http://127.0.0.1:8766/api/transcribe \
  -F audio=@/path/to/audio.wav \
  -F num_beams=4 \
  -F max_new_tokens=256
```

Example response:

```json
{
  "transcript": "integral from zero to pi of sine x dx.",
  "math_text": "\\int_0^\\pi \\sin x dx.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.72,
      "text": "integral from zero to pi of sine x dx."
    }
  ],
  "whisper_model": "small.en",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```

## Deploy To Hugging Face Spaces

This folder is ready for a Docker Space.

### 1. Login

Do not commit or paste tokens into files.

```bash
huggingface-cli login
```

Or use the Python API with your local authenticated session.

### 2. Create The Space

```python
from huggingface_hub import HfApi

api = HfApi()
api.create_repo(
    repo_id="vibhuiitj/whispermath-webdemo",
    repo_type="space",
    space_sdk="docker",
    private=False,
    exist_ok=True,
)
```

### 3. Upload The Folder

```python
from huggingface_hub import HfApi

api = HfApi()
api.upload_folder(
    repo_id="vibhuiitj/whispermath-webdemo",
    repo_type="space",
    folder_path="/Users/vaibhav/Desktop/beyond/whispermath/webdemo",
    path_in_repo=".",
    ignore_patterns=[
        "__pycache__/*",
        "*.pyc",
        ".DS_Store",
        ".venv/*",
        "audio/*",
        "outputs/*",
    ],
    commit_message="Deploy WhisperMath web demo",
)
```

### 4. Check Runtime

```python
from huggingface_hub import HfApi

runtime = HfApi().get_space_runtime("vibhuiitj/whispermath-webdemo")
print(runtime)
```

Expected final state:

```text
stage='RUNNING'
hardware='cpu-basic'
requested_hardware='cpu-basic'
```

### 5. Test The Deployed Space

Health:

```bash
curl https://vibhuiitj-whispermath-webdemo.hf.space/api/health
```

Text decode:

```bash
curl -L -X POST https://vibhuiitj-whispermath-webdemo.hf.space/api/decode \
  -H "Content-Type: application/json" \
  -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
```

Expected:

```json
{
  "transcript": "x squared minus y squared equals four",
  "math_text": "x^2-y^2=4",
  "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
}
```

## Docker Notes

The Docker image:

- Uses `python:3.11-slim`
- Installs `libgomp1`, needed by some CPU inference dependencies
- Runs on port `7860`, the Hugging Face Spaces default
- Sets `WHISPERMATH_WHISPER_MODEL=small.en`
- Sets `WHISPERMATH_DECODER_DEVICE=cpu`
- Sets `HF_HUB_DISABLE_XET=1`

`HF_HUB_DISABLE_XET=1` is included because local testing showed large model downloads could get stuck with incomplete Xet-backed cache files.

## Troubleshooting

### Space Takes A Long Time To Start

The first start downloads:

- `Systran/faster-whisper-small.en`
- `vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724`

On free CPU, startup can take a few minutes.

### Audio Transcription Is Wrong

Try these in order:

1. Speak shorter phrases.
2. Use clearer operator words, for example `over` instead of `by`.
3. Check the editable transcript box.
4. Correct the transcript manually.
5. Click **Decode Transcript**.
6. Try `base.en`, `small.en`, or `medium.en` locally.

### ByT5 Output Is Wrong But Transcript Is Correct

That means the decoder needs more targeted training data. Common weak phrases include:

```text
by / divided by / over
whole square
derivative of ...
limit as ...
fraction with grouped numerator and denominator
```

Use the editable transcript box to collect failure cases.

### KaTeX Rendering Fails

The app still shows the raw ByT5 output under **Raw ByT5 Output**. If the raw output is malformed LaTeX-like text, KaTeX may render an error-colored expression or show fallback text.

### Medium Whisper Hangs During Download

If a local download leaves an incomplete cache file, run:

```bash
HF_HUB_DISABLE_XET=1 python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download("Systran/faster-whisper-medium.en", max_workers=1)
PY
```

Then restart:

```bash
HF_HUB_DISABLE_XET=1 WHISPERMATH_WHISPER_MODEL=medium.en \
python -m uvicorn app:app --host 127.0.0.1 --port 8766
```

## Security

Never commit Hugging Face tokens or API keys into this folder.

If a token is pasted into a chat or terminal history by mistake, revoke/rotate it from Hugging Face settings.